A novel method for object detection using deep learning and CAD models
Igor Garcia Ballhausen Sampaio, Luigy Machaca, José Viterbo, Joris Guérin
AA novel method for object detection using deep learning and CAD models
Igor Garcia Ballhausen Sampaio , Luigy Machaca , Jos´e Viterbo and Joris Gu´erin Computing Institute, Universidade Federal Fluminense, Brazil LAAS-CNRS, ONERA, Universit´e de Toulouse, France { igorgarcia, luigyarcana, jviterbo } @id.uff.br, [email protected] Keywords: Object Detection, CAD Models, Synthetic Image Generation, Deep Learning, Convolutional Neural NetworkAbstract: Object Detection (OD) is an important computer vision problem for industry, which can be used for qualitycontrol in the production lines, among other applications. Recently, Deep Learning (DL) methods have enabledpractitioners to train OD models performing well on complex real world images. However, the adoption ofthese models in industry is still limited by the difficulty and the significant cost of collecting high qualitytraining datasets. On the other hand, when applying OD to the context of production lines, CAD models ofthe objects to be detected are often available. In this paper, we introduce a fully automated method that uses aCAD model of an object and returns a fully trained OD model for detecting this object. To do this, we createda Blender script that generates realistic labeled datasets of images containing the object, which are then usedfor training the OD model. The method is validated experimentally on two practical examples, showing thatthis approach can generate OD models performing well on real images, while being trained only on syntheticimages. The proposed method has potential to facilitate the adoption of object detection models in industry asit is easy to adapt for new objects and highly flexible. Hence, it can result in significant costs reduction, gainsin productivity and improved products quality.
Recently, Deep Learning (DL) has produced excellentresults for Object Detection (OD) (Liu et al., 2020).On the one hand, a typical limitation with DL is therequirement of large labeled datasets for training. In-deed, although there are various large databases avail-able online for OD, for specific industrial applicationsit is always necessary to create custom datasets con-taining the objects of interest. While the scenariospresent in public datasets are useful from both a re-search and application standpoint, it was found thatindustrial applications, such as bin picking or defectinspection, have quite different characteristics that arenot modeled by the existing datasets (Drost et al.,2017). As a result, methods that perform well on ex-isting datasets sometimes show different results whenapplied to industrial scenarios without retraining. Theprocess of generating a specific dataset for retrainingis tedious, and can be error-prone when conducted bynon-professional technicians. Moreover, generating anew dataset and labeling it manually can be very timeconsuming and expansive (Jabbar et al., 2017). a https://orcid.org/0000-0002-0339-6624 b https://orcid.org/0000-0002-8048-8960 On the other hand, OD for industrial productionlines presents the specificity that the manufacturersoften have access to the CAD models of the objectsto detect. Thanks to advances in computer graphicstechniques, such as ray tracing (Shirley and Morley,2003), the generation of photo-realistic images is nowpossible. In such artificially generated images, thecomputer can be employed to obtain bounding box la-beling for free. The use of synthetic images renderedfrom CAD models to train OD models has alreadybeen proposed in (Peng et al., 2015), (Rajpura et al.,2017) and (Hinterstoisser et al., 2018). However, theirapproaches are not automated as they require manualscene creation by Blender artists. In addition, the ob-jects used in these works are usually generic, such asbuses, airplanes, cars or animals.The main contribution of this paper is to present anew method for training OD models in synthetic im-ages generated from CAD models that is fully auto-matic and thus well suited for industrial use. The pro-posed method consists of the automatic generation ofrealistic labeled images containing the objects to bedetected, followed by the fine-tuning of a pretrainedOD model on the artificial dataset. An extensive studyis conducted to properly select the user-defined pa- a r X i v : . [ c s . C V ] F e b ameters so that it maximizes the performance on realworld images. Our method is evaluated using theCAD models of two industrial objects for training, aswell as real labeled images containing the objects forevaluation. The results obtained are very promisingas we manage to get F1-scores above 90% on real im-ages while training only on synthetic images.This paper is organized as follows. Section 2presents the related work in the field of object de-tection and deep learning for industry. Section 3provides detailed explanations about the proposedmethod created. Section 4 describes our experiments,presents the results obtained and discusses them. Fi-nally, conclusions and directions for future work arepresented in Section 5. This section presents related work about OD, indus-trial applications of DL-based computer vision, aswell as computer vision methods using CAD models.
OD is a challenging computer vision problem thatconsists in locating instances of objects from prede-fined categories in natural images (Prasad, 2012). Ithas many applications in various domains such asautonomous driving, security and medical diagno-sis (Xiao et al., 2020). Deep learning techniques haveemerged as a powerful strategy for learning character-istic representations directly from data and have led tosignificant advances in the field of generic object de-tection (Liu et al., 2020). In the last decade, manycompetitions for object detection have been held toprovide large annotated datasets to the community,and to unify the benchmarks and metrics for faircomparison between proposed methods (Everinghamet al., 2010), (Lin et al., 2014), (Zhou et al., 2017),(Kuznetsova et al., 2018).Some examples of OD methods proposed withinthe last few years include (He et al., 2015), wherethe author proposes a new network structure, calledSPP-net, which can generate a fixed-length represen-tation, regardless of the size/scale of the image. Otherworks such as (Jana et al., 2018) aim to improve pro-cessing speed and at the same time efficiently identifyobjects in the image. Finally, deeper CNNs have ledto record-breaking improvements in the detection ofmore general object categories, a shift which cameabout when DCNNs began to be successfully appliedto image classification (Liu et al., 2020).
Although general purpose OD methods have greatlyimproved thanks to the availability of large publicdatasets, the detection of instances in the industrialcontext must be approached differently, since anno-tated images are generally not available or rare. In-deed, to train a deep learning model, hundreds of an-notated images for each object category are needed.Specific datasets need to be collected and annotatedfor different target applications. This process is time-consuming and laborious, and increases the burden onoperators, which goes against the goal of industrialautomation (Cohen et al., 2020), (Ge et al., 2020).A public datasets adapted to the industrial contextwas developed in (Drost et al., 2017). Unlike other3D object detection datasets, this work models in-dustrial waste collection and object inspection tasksthat often face different challenges. In addition, theevaluation criteria are focused on practical aspects,such as execution times, memory consumption, use-ful measures of correction and precision. Other ex-amples of datasets adapted to the industrial context in-clude (Gu´erin et al., 2018b) and (Gu´erin et al., 2018a).Finally, in (Yang et al., 2019), a method to detectdefects of tiny parts in real time was developed, basedon object detection and deep learning. To improvetheir results, the authors consider the specificities ofthe industrial application in their method such as theproperties of the parts, the environmental parametersand the speed of movement of the conveyor. This isa good example to adapt OD training methods to thespecific constraints of the industrial context.
The first commercial CAD programs came up in the1970s, providing functions for 2D-drawing and dataarchival, and evolved into the main engineering de-sign tool (Lindsay et al., 2018), (Hirz et al., 2017).These models can provide a scalable solution forintelligent and automatic object recognition, track-ing and augmentation based on generic object mod-els (Ben-Himane et al., 2010). For example, CADmodels have been used to support multi-view detec-tion (Zhang et al., 2013). In (Peng et al., 2015),3D models were used as the primary source of infor-mation to build object models. In other works, 3DCAD models were used as the only source of labeleddata (Lin et al., 2014), (Everingham et al., 2010), butthey are limited to generic categories, such as cars andmotorcycles . a) Training(b) InferenceFigure 1: Overview of the proposed method for training an object detection network using a CAD model.
An overview of this proposed method can be seen inFigure 1. First, a custom Blender code is used to gen-erate labeled training images containing the renderedCAD model in context. Then, a pretrained object de-tection model is fine-tuned on the generated dataset.Finally, the model can be used for inference on realimages (Figure 1b).
For the automatic generation of the training images,the software Blender (Blender Online Community,2018) is used. Blender is a powerful software for 3Ddesign, which includes features such as modeling, rig-ging, simulation and rendering. Blender has a goodPython API, is open-source and has good GPU sup-port.In order to generate a synthetic training imagesample, our code requires several elements. First, aCAD model of the object of interest as well as severalother industrial CAD models need to be available. Inthe experiments of this paper, we use the two objectsshown in Figure 2, for which we also have real worldtest images. The other objects serve as distractorsto help the model focusing on the right object. TheCAD models for the distractors are gathered from theGrabcad website . Different textures for the differentdistractors as well as for the background are gatheredfrom the Poliigon website . Finally, the color and tex- https://grabcad.com/ ture of the object of interest are reproduced manually.Once we have access to all the elements above,the generation code goes as follows. A floor and atable are created and some distractors are sampled.Using physics simulation, the distractors are droppedfrom a random height on the table. The position ofthe object of interest is also randomly sampled. Oncethe 3D scene is created, textures and colors are sam-pled for the backgrounds and the distractors and theentire scene is textured. Light sources and camerasare also sampled and placed randomly. Constraintson the camera pose are applied, in order to ensure thatthe object appears in the camera view. Once the scenehas been created, the rendering occurs and generatesan image. By removing the light sources and mak-ing the object of interest a light source itself, we cangenerate another image which can be used for bound-ing box labeling. This procedure is necessary becauseeven if we know the location of the object, it can bepartly hidden by distractors and thus distort the la-beling. Example images generated using our Blendercode can be seen in Figure 3. a) Adblue (b) Yamaha logoFigure 3: Example of images generated using our custom Blender script. In this work, we did not train a new CNN architecturefrom the scratch. Instead, we used one of the pre-trained models provided by TensorFlow Object De-tection API (Huang et al., 2017). This approach iscalled transfer learning and consists in starting train-ing from a model that already knows basic feature ex-traction skills and is less likely to overfit the syntheticdatasets. Indeed, the diversity that we can create withBlender is limited as we cannot get an infinite amountof textures and distractors, and the diversity alreadyencountered by the network during pre-training canhelp reduce overfitting. In addition, using a networkpre-trained on real images can prevent the networkfrom learning detection features that depend too muchon the generation procedure.There exists several models in the TensorFlowOD model zoo. More information on the perfor-mance of the detection, as well as the reference ex-ecution times, for each of the available pre-trainedmodels, can be found on the Github page of theAPI . In practice, the model used in this paper is the faster rcnn inception v2 coco model, which providesa good trade-off between performance and speed.Faster R-CNN, the model used in this work, takesas input an entire image and a set of object propos-als. The network first processes the whole image withseveral convolutional and max pooling layers to pro-duce a convolutional feature map. Then, for eachobject proposal, a region of interest (RoI) poolinglayer extracts a fixed-length feature vector from thefeature map. Each feature vector is fed into a se-quence of fully connected layers that finally branchinto two sibling output layers: one that produces soft-max probability estimates over K object classes plusa catch-all “background” class and another layer thatoutputs four real-valued numbers for each of the K object classes. Each set of 4 values encodes refined https://github.com/tensorflow/models bounding-box positions for one of the K classes. Fora more detailed view about Faster-RCNN, we referthe reader to the original paper (Ren et al., 2015), orto the following tutorial (Ananth, 2019).To train the final OD model, the TensorFlow ODAPI requires a specific file structure of the trainingimages and labels. This step is carried out automati-cally by our script. The Blender script used for image generation hasmany hyperparameters that must be chosen before us-ing it, such as the number of distractors, the numberof scenes generated or the resolution of synthetic im-ages. Hence, we conduct a set of experiments to prop-erly select these parameters in order to optimize theOD results for inference on real images. In this sec-tion, we explain the parameter selection procedure.In other words, we present the dataset on which thedifferent sets of parameters were evaluated, as well asthe metrics used to assess the quality of the results ob-tained with a given set of parameters. The results ob-tained for this parameter selection procedure are pre-sented in Section 4.
The objective of this work is to validate that an objectdetector trained on synthetic images can generalizeto real world industrial cases. Hence, we use a testdataset composed of 380 real images containing theobjects corresponding to the CAD models used fortraining. The bounding box annotation files for thetest images are generated manually using a softwarecalled LabelImg . This application allows us to drawand save the annotations of each image as xml files inthe PASCAL VOC format (Everingham et al., 2010). https://github.com/tzutalin/labelImgigure 4: Intersection over Union (IoU) computation In order to evaluate the quality of the trained model onreal images, and thus to be able to select the best hy-perparameters for image generation and training, weused standard OD metrics that are presented here.
Intersection over Union (IoU) is an evaluationmetric used to measure how much a predicted bound-ing box matches with a ground truth bounding box.For a pair of bounding boxes, IoU is defined as thearea of the intersection divided by the area of theunion (Figure 4). If A corresponds to the ground-truthbox and B to the predicted box, then, IoU is computedas :
IoU = | A ∩ B || A ∪ B | , (1)where | . | denotes the area of a given shape. The nu-merator is called the overlap area and the denomina-tor is called the combined area. IoU ranges between 0and 1, where 1 means that the bounding boxes are thesame and 0 that there is no overlap. Precision, Recall, F1-Measure
We call confidencescore, the probability that an anchor box contains anobject from a certain class. It is usually predicted bythe classifier part of the object detector. The confi-dence score and IoU are used as the criteria to de-termine whether a detection is a true positive or afalse positive. Given a minimal threshold on the con-fidence score for bounding box acceptance, and an-other threshold on IoU to identify matching boxes, adetection is considered a true positive (TP) if thereexists a ground truth such that: confidence score > threshold; the predicted class matches the class of theground truth; and IoU > threshold IoU . The violationof any of the last two conditions generates a false pos-itive (FP). In case multiple predictions correspond to the same ground-truth, only the one with the highestconfidence score counts as a true positive, while theothers are considered false positives. When a groundtruth bounding box is left without any matching pre-dicted detection, it counts as a false negative (FN).If we note TP, FP and TN respectively the numberof True Positives, False Positives and False Negativesin a dataset, we can define the following metrics:
Precision = T PT P + FP , (2) Recall = T PT P + FN , (3) F = Precision · RecallPrecision + Recall . (4)A high precision means that most of the predictedboxes had a corresponding ground truth, i.e., the ob-ject detector is not producing bad predictions. A highrecall means that most of the ground truth boxes hada corresponding prediction, i.e., the object detectorfinds most objects in the images. The F -Score isthe harmonic mean of the precision and recall, it isneeded when a balance between precision and recallis sought.In the case of object detection on production lines,a low precision means that sometimes a part might beabsent and the model would not see it, whereas a lowrecall means that sometimes the part is present and themodel raises an alert anyways. For this reason, both agood recall and a precision are required and the choiceof using the F -Score metric seems appropriate. Average Precision
After an OD model has beentrained, the computation of Precision, Recall and F1-score depends on the value of the two thresholds de-fined above (for the confidence score and IoU). Inorder to properly choose the values of these thresh-olds, it is interesting to analyze the Precision x Re-call curves. For each class, and for a given value ofthe IoU threshold, the confidence threshold is set as avariable and sampled between 0 and 1 to plot a para-metric curve with precision and recall as the x andy-axis.A class-specific object detector is considered goodif the precision remains high as the recall increases,meaning that if you vary the confidence limit, the pre-cision and recall will still be high. Hence, to com-pare between curves we generally rely on a numericalmetric called Average Precision (AP). Since 2010, thestandard computation method for AP consists in cal-culating the area under the curve (AUC) of the Preci-sion x Recall curve (Everingham et al., 2010).
EXPERIMENTS AND RESULTS
The results obtained for the parameter selection pro-cedure as well as our final evaluations are presentedhere. These experiments were conducted on a NvidiaQuadro P5000 GPU and a 2.90GHz Intel Xeon E3-154M v5 processor (16 GB of RAM).
The parameter selection procedure is conducted ex-clusively on the Yamaha logo object, the best set ofparameters is then tested on the Adblue object to en-sure that it also performs well. The influence of fourtunable parameters on the final results is studied here.for each parameter, three values were selected for thetests. These parameters and their studied values are:•
Resolution:
Camera poses:
2, 5, 20•
Number of scenes:
20, 50, 200•
Number of distractors:
0, 5, 20From simple preliminary experiments that are not pre-sented here, we concluded that the number of tex-tures used for the floor, the distractors and the sup-port should be set to the maximum number of texturesavailable (in our case 7 for the floor and 6 for the twoothers). The parameter values used in this work werechosen empirically, that is, after several test scenar-ios, these values were the ones that generated the bestperformance regarding the metrics.In total, from the values selected for the four pa-rameters, we sampled more than 30 combinations andcompared the OD results on the testing set of real im-ages. For each combination tested, we trained theFaster-RCNN CNN on the synthetic images that weregenerated. We note that, for each hyperparameterscombination, the experiments were repeated 10 timesin order to attenuate the influence of the random com-ponents in the generation and training process. Forreasons of space in this article, it was not possible topresent all results. However, in order to demonstratethe importance of this parameter selection step, Ta-ble 1 shows the best and the worst configuration thatwere tested.In Table 1, we can see that the distractors are anessential element in our proposed pipeline for imagegeneration. Indeed, when removing them, we can seea drop of around 23% in F1-score, on average acrossthe 10 experiment samples. We also tried combina-tions with few distractors, but the F1-score resultsdropped significantly. This makes sense as the realimages evaluated had several distractors as well.
Table 1: Best and worst hyperparameters configurations ob-tained and their corresponding results.
Parameters Best Case Worst Case
Resolution 960x540 960x540cam poses 5 5n scenes 20 20n images 100 100n distractors 20 0Generation Time 1257.30 749.92n samples 10 10Precision %: Avg (Std.Dev) (15.41) (16.27)
Recall %: Avg (Std.Dev) (4.07) (6.45)
F1-Score %: Avg (Std.Dev) (10.57) (11.49)
Table 2: Results obtained with the best set of hyperparame-ters
Object Precision % Recall % F1-Score %
Adblue 85.11 80.00 81.93Yamaha logo 78.06 96.22 85.19
Another important point is that the resolution ofthe images generated should be greater than the infer-ence images. In all of our tests, this scenario alwaysproduced the best results. It also makes sense as it iseasier to learn from a more detailed/complex modeland then evaluate in a less detailed/complex scenario.Finally, we also tried to increase the number ofgenerated training images to see if this would leadto an increase in performance. Surprisingly, we ac-knowledged that the performance dropped for thecase with 20 distractors, 20 camera poses and 50scenes (1000 images). This might mean that whenpresented too many synthetic images, the model startsoverfitting to the biases involved by our generationprocess, and it also indicates that we do not need alarge number of images to train our model. In ad-dition to this performance drop, generating ten timesmore images also makes the proposed pipeline almost25 times slower (31037.16 seconds).
In this section, we evaluate the results of the best com-bination of parameters (Best Case from Table 1) inmore details. These results are presented in Table 2,they correspond to using a confidence threshold of0.9 and an IoU threshold of 0.5. From Table 2, wecan see that the best parameters identified using onlythe Yamaha logo produce similar results when appliedto another object (Adblue). This suggests that theproposed parameters for our method seem to be wellsuited for different objects and thus could generalizewell to various industrial use cases without additionalparameter tuning. .3 Discussion
It is difficult to compare our results with other worksin the literature. Indeed, as far as we know, the ap-proach presented in this work is the first proposal tobuild a fully automated pipeline that takes as inputthe CAD model of an object and outputs a trainedobject detection model for this object without anyreal image. For fair comparison we would need tocompare our work with other end-to-end systematicapproaches to build OD models from CAD models,which is impossible as it does not exists. Else, wehope that the results presented in this work can serveas a good baseline for comparison of future works inthis research direction.However, we give a rough comparison with otherrelevant works to give an idea of how well our ap-proach is performing. In (Mazzetto et al., 2019) thedetection of objects in an automobile production linewas implemented, using only real images of the ob-jects. In this work, the estimated detection accuracywas around 90 %, which is only about 5 to 10 % bet-ter than the results obtained in our work using onlysynthetic images. In (Jabbar et al., 2017), the au-thors also train an OD model using synthetic imagesgenerated with Blender and evaluate the results inreal images. However, this approach is not entirelyautomated since the scenes are created manually byBlender artists to ensure photo-realism. The objectused in this work for evaluation is a glass of wineand the maximum AP obtained is 71.14 %. We cansee that our systematic approach seems to work betterthan this approach, however, we cannot reproduce themethod on our objects, as we cannot create the scenesmanually in the same way that they would. The po-tential better performance of our approach can be ex-plained by the fact that the loss of photo-realism canbe compensated by the higher number of images inour synthetic datasets. Indeed, with our fully auto-mated approach, it is faster and requires no effort togenerate more data, unlike in (Jabbar et al., 2017).
This work presents a systematic approach to train ob-ject detection models to address industrial scenarios,using only a CAD model of the object of interest asinput. The method first generates realistic syntheticimages using a custom Blender script, and then trainsa faster-RCNN OD model using the TensorFlow ODAPI. To understand and optimize the different param-eters in the proposed pipeline, a systematic parame-ter selection study is conducted using a Yamaha logo CAD model for training and real images containingthe same object in context for evaluation. The se-lected hyperparameters are then tested on an otherobject, showing that they can generalize to differentscenarios.Over the last decade, successful deep learningmethods have been developed to tackle the challeng-ing problem of generic object detection. However,when it comes to the problem of OD in an indus-trial environment, the availability of good quality databecomes a bottleneck. To address this issue we pro-posed to use synthetic images for training, which ischallenging as it might not reflect the high variabil-ity found in in real industrial environment (objects,pieces and scenery, etc.). In addition, there is also adifficulty in finding CAD models of specific indus-trial objects so that they can be trained and other ap-proaches can be tested and compared. Thus, as a con-sequence of this work, a set of data was produced andmade publicly available for future research .Therefore, the main conclusion from this work isthat it is possible to train an object detection model ona set of synthetic images generated from CAD mod-els with excellent performance. In addition, it wasshown that a large set of images is not needed to ob-tain a significant result. Our experiments indicate thatthe proposed rendering process is sufficient to obtaingood performances and that the way of building andrendering the scenes is crucial for the final result. ACKNOWLEDGEMENTS
Our work has benefited from the AI InterdisciplinaryInstitute ANITI. ANITI is funded by the French ”In-vesting for the Future – PIA3” program under theGrant agreement n°ANR-19-PI3A-0004.
REFERENCES
Ananth, S. (2019). Faster R-CNN for object detection, atechnical paper summary.Ben-Himane, S., Hintestroisser, S., and Navab, N. (2010).Computer vision CAD models. US Patent App.12/682,199.Blender Online Community (2018).
Blender - a 3D mod-elling and rendering package . Blender Foundation.Cohen, J., Crispim-Junior, C., Grange-Faivre, C., andTougne, L. (2020). CAD-based learning for ego-centric object detection in industrial context. In https://github.com/igorgbs/systematic approach cadmodels heory and Applications , volume 5, pages 644–651.SCITEPRESS.Drost, B., Ulrich, M., Bergmann, P., Hartinger, P., and Ste-ger, C. (2017). Introducing MVTec ITODD-a datasetfor 3d object recognition in industry. In Proceedingsof the IEEE International Conference on ComputerVision Workshops , pages 2200–2208.Everingham, M., Van Gool, L., Williams, C. K., Winn,J., and Zisserman, A. (2010). The pascal visual ob-ject classes (VOC) challenge.
International journal ofcomputer vision , 88(2):303–338.Ge, C., Wang, J., Wang, J., Qi, Q., Sun, H., and Liao,J. (2020). Towards automatic visual inspection: Aweakly supervised learning method for industrial ap-plicable object detection.
Computers in Industry ,121:103232.Gu´erin, J., Gibaru, O., Nyiri, E., Thiery, S., and Palos,J. (2018a). Automatic construction of real-worlddatasets for 3D object localization using two cam-eras. In
IECON 2018-44th Annual Conference ofthe IEEE Industrial Electronics Society , pages 3655–3658. IEEE.Gu´erin, J., Gibaru, O., Nyiri, E., Thieryl, S., and Boots,B. (2018b). Semantically meaningful view selection.In , pages 1061–1066.IEEE.He, K., Zhang, X., Ren, S., and Sun, J. (2015). Spatial pyra-mid pooling in deep convolutional networks for visualrecognition.
IEEE transactions on pattern analysisand machine intelligence , 37(9):1904–1916.Hinterstoisser, S., Lepetit, V., Wohlhart, P., and Konolige,K. (2018). On pre-trained image features and syn-thetic images for deep learning. In
Proceedings of theEuropean Conference on Computer Vision (ECCV) .Hirz, M., Rossbacher, P., and Gulanov´a, J. (2017). Futuretrends in CAD–from the perspective of automotiveindustry.
Computer-aided design and applications ,14(6):734–741.Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A.,Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadar-rama, S., et al. (2017). Speed/accuracy trade-offs formodern convolutional object detectors. In
Proceed-ings of the IEEE conference on computer vision andpattern recognition , pages 7310–7311.Jabbar, A., Farrawell, L., Fountain, J., and Chalup, S. K.(2017). Training deep neural networks for detectingdrinking glasses using synthetic images. In
Interna-tional Conference on Neural Information Processing ,pages 354–363. Springer.Jana, A. P., Biswas, A., et al. (2018). YOLO based de-tection and classification of objects in video records.In , pages 2448–2452. IEEE.Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin,I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M.,Duerig, T., et al. (2018). The open images datasetv4: Unified image classification, object detection, andvisual relationship detection at scale. arXiv preprintarXiv:1811.00982 . Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-manan, D., Doll´ar, P., and Zitnick, C. L. (2014). Mi-crosoft COCO: Common objects in context. In
Euro-pean conference on computer vision , pages 740–755.Springer.Lindsay, A., Paterson, A., and Graham, I. (2018). Identi-fying and quantifying inefficiencies within industrialparametric CAD models. In
Advances in Manufac-turing Technology XXXII: Proceedings of the 16th In-ternational Conference on Manufacturing Research ,volume 8, page 227. IOS Press.Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu,X., and Pietik¨ainen, M. (2020). Deep learning forgeneric object detection: A survey.
International jour-nal of computer vision , 128(2):261–318.Mazzetto, M., Southier, L. F., Teixeira, M., and Casanova,D. (2019). Automatic classification of multiple ob-jects in automotive assembly line. In , pages 363–369. IEEE.Peng, X., Sun, B., Ali, K., and Saenko, K. (2015). Learningdeep object detectors from 3D models. In
Proceed-ings of the IEEE International Conference on Com-puter Vision , pages 1278–1286.Prasad, D. K. (2012). Survey of the problem of object de-tection in real images.
International Journal of ImageProcessing (IJIP) , 6(6):441.Rajpura, P. S., Bojinov, H., and Hegde, R. S. (2017). Ob-ject detection using deep CNNs trained on syntheticimages. arXiv preprint arXiv:1706.06782 .Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards real-time object detection with regionproposal networks. In
Advances in neural informationprocessing systems , pages 91–99.Shirley, P. and Morley, R. K. (2003).
Realistic ray tracing .AK Peters/CRC Press.Xiao, Y., Tian, Z., Yu, J., Zhang, Y., Liu, S., Du, S., andLan, X. (2020). A review of object detection basedon deep learning.
Multimedia Tools and Applications ,pages 1–63.Yang, J., Li, S., Wang, Z., and Yang, G. (2019). Real-timetiny part defect detection system in manufacturing us-ing deep learning.
IEEE Access , 7:89278–89291.Zhang, X., Yang, Y.-H., Han, Z., Wang, H., and Gao, C.(2013). Object class detection: A survey.
ACM Com-puting Surveys (CSUR) , 46(1):1–53.Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Tor-ralba, A. (2017). Places: A 10 million image databasefor scene recognition.