[PDF] Multi-Purposing Domain Adaptation Discriminators for Pseudo Labeling Confidence

Abstract

Often domain adaptation is performed using a discriminator (domain classifier) to learn domain-invariant feature representations so that a classifier trained on labeled source data will generalize well to unlabeled target data. A line of research stemming from semi-supervised learning uses pseudo labeling to directly generate "pseudo labels" for the unlabeled target data and trains a classifier on the now-labeled target data, where the samples are selected or weighted based on some measure of confidence. In this paper, we propose multi-purposing the discriminator to not only aid in producing domain-invariant representations but also to provide pseudo labeling confidence.

Full PDF

MMulti-Purposing Domain Adaptation Discriminators for PseudoLabeling Confidence

Garrett Wilson [email protected] State UniversityPullman, WA

Diane J. Cook [email protected] State UniversityPullman, WA

ABSTRACT

Often domain adaptation is performed using a discriminator (do-main classifier) to learn domain-invariant feature representationsso that a classifier trained on labeled source data will generalizewell to unlabeled target data. A line of research stemming fromsemi-supervised learning uses pseudo labeling to directly generate“pseudo labels” for the unlabeled target data and trains a classifieron the now-labeled target data, where the samples are selectedor weighted based on some measure of confidence. In this paper,we propose multi-purposing the discriminator to not only aid inproducing domain-invariant representations but also to providepseudo labeling confidence.

CCS CONCEPTS • Computing methodologies → Transfer learning ; Unsuper-vised learning ; Neural networks ; •

Theory of computation → Adversarial learning . KEYWORDS domain adaptation, pseudo labeling, instance weighting, domain-invariant features

ACM Reference Format:

Garrett Wilson and Diane J. Cook. . Multi-Purposing Domain AdaptationDiscriminators for Pseudo Labeling Confidence. In

Proceedings of AdvML’19:Workshop on Adversarial Learning Methods for Machine Learning and DataMining at KDD.

ACM, New York, NY, USA, 6 pages.

Unsupervised domain adaptation is a problem consisting of twodomains: a source domain and a target domain. Labeled source dataand unlabeled target data are available for use during training, andthe goal is to learn a model that performs well on data from thetarget domain [15, 17, 36]. As a result, this can be used to reducethe need for costly labeled data in the target domain.A common approach for domain adaptation is to learn a domain-invariant feature representation, which in deep learning methodsis typically a feature extractor neural network. Intuitively, if a clas-sifier trained on these domain-invariant features of the labeledsource data performs well, then the classifier may generalize tothe unlabeled target data since the feature distributions for bothdomains will be highly similar. (Though, performance on the targetdata depends on how similar the domains are, and this method mayactually increase the error if the domains are too different [4, 58].)Numerous methods proposed for achieving this goal have yielded

AdvML’19: Workshop on Adversarial Learning Methods for Machine Learning and DataMining at KDD, August 5th, 2019, Anchorage, Alaska, USA . promising results, and many of these methods use adversarial train-ing [56].One such adversarial domain-invariant feature learning methodis the domain-adversarial neural network (DANN) [14, 15], whichis a typical baseline for other variants. This method consists of a fea-ture extractor network followed by two additional networks: a taskclassifier and a domain classifier (Figure 1). The network is updatedby two competing objectives: (1) the feature extractor followed bythe task classifier learns to correctly classify the labeled source datawhile the domain classifier learns to correctly predict whether thefeatures originated from source or target data, and (2) the featureextractor learns to make the domain classifier predict the domainincorrectly. To this end, they propose a gradient reversal layer be-tween the feature extractor and domain classifier so that duringbackpropagation, the gradient is negated when updating the featureextractor weights. More recently, Shu et al. [47] found replacing thegradient reversal layer with adversarial alternating updates fromgenerative adversarial networks (GANs) [18] to perform better.Pseudo labeling is a technique from semi-supervised learningthat is also sometimes included in domain-invariant feature learningmethods for domain adaptation [11, 41, 45, 60]. In pseudo labeling,a source classifier trained on the labeled source data is first usedto label the unlabeled target data, generating “pseudo” labels thatmay not all be correct. Next, a target classifier can be trained in asupervised manner on the now-labeled target data. Often there is aselection criterion to utilize only pseudo-labeled data that are more-likely correct (i.e., the model is more confident on those samples).Typically the selection is based on whether the softmax outputprediction entropy is low enough [11, 35, 60]. The softmax outputcan be viewed as a probability distribution over the possible labels,so a uniform distribution over these predictions indicates the modelhas no idea what label to predict whereas a very high predictionprobability for one class (low entropy) indicates the model has highconfidence in a prediction. Other measures of confidence includeensemble agreement (combined with softmax confidence) [41] or k -nearest neighbor agreement [45].In this paper, we propose another selection criterion for pseudolabeling: the discriminator’s confidence. Methods such as DANNalready have a discriminator, allowing it to be easily multi-purposedto not only aid in producing domain-invariant representations butalso to provide pseudo labeling confidence. The domain discrim-inator learns to classify feature representations as either sourcedomain or target domain, but in unsupervised domain adaptationthis can also be interpreted as known label vs. unknown label, orrather, accurate vs. possibly inaccurate, assuming the task classifierperforms well on the source data. Thus, we could view samples as a r X i v : . [ c s . L G ] J u l dvML’19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD, August 5th, 2019, Anchorage,Alaska, USA Garrett Wilson and Diane J. Cook (labeled)sourcedata(unlabeled)targetdata FeatureExtractorFeatureExtractor TaskClassifierDomainClassifier classlabel“source”or“target” SharedWeights (a) Training (unlabeled)targetdata FeatureExtractor TaskClassifier classlabel (b) Testing

Figure 1: Network setup for DANN that learns a domain-invariant feature representation – the basis for our proposedmethod. “confident” if the discriminator incorrectly classifies the target sam-ples’ feature representations as originating from the source domain.Intuitively, this process may select samples that are pseudo labeledcorrectly since the feature representation was close to that of datawith known labels. However, our proposed approach assumes (1)the task classifier does perform well on the labeled source data and(2) there exists sufficient similarity between domains. The first as-sumption is easy to verify during training. The second assumptionis harder to quantify, but empirically we obtain high target domainperformance.To explain our proposed method, we first discuss the relationshipwith existing methods. Second, we describe our method in detail.Finally, we perform experiments on a variety of image datasetscommonly used for domain adaptation.

Numerous domain-invariant feature learning methods have beenproposed. Some do this by minimizing a divergence such as maxi-mum mean discrepancy [27, 29, 38], second-order statistics [32, 33,49, 54, 57], or contrastive domain discrepancy [20]. Others use op-timal transport [9, 10], graph matching [11], or reconstruction [16].Still others learn domain-invariant features adversarially with adomain classifier [1, 14, 15, 28, 37, 46, 51, 52] or a GAN [43, 44]. Themethod in this paper is based on DANN, an adversarial approach.Several domain-invariant adaptation methods also incorporatepseudo labeling to further improve performance. Some select confi-dent samples that have low entropy [11, 60]. Others use an ensembleof networks that make independent predictions and select confi-dent samples based on a combination of the ensemble agreementand verifying that at least one of the ensemble predictions has lowentropy [41]. One method classifies with k -nearest neighbors andthus bases its confidence on agreement of the k predictions [45]. Inthis paper, we propose using the DANN discriminator to provide ameasure of confidence for pseudo labeling. Pseudo labeling can be viewed as conditional entropy regulariza-tion [19, 25], which while proposed for semi-supervised learninghas also been applied in domain adaptation methods [21, 47]. En-tropy regularization and pseudo labeling are based upon the clusterassumption: data are clustered by class/label and separated by low-density regions. If this is true, then decision boundaries should liein these low-density regions [7, 25]. Entropy regularization is oneway to move decision boundaries away from regions with higherdensity. However, this assumes that the decisions do not drasticallychange when approaching data points, i.e., that the model is locallyLipschitz [47]. This can be enforced with virtual adversarial train-ing [30], which thus is typically also used when applying entropyregularization to domain adaptation [21, 47]. Alternative methodshave also been proposed with the same effect of moving decisionboundaries into lower-density regions based on GANs [55], adver-sarial dropout [42], and self-ensembling [13, 22, 50]. We base ourmethod on pseudo labeling rather than entropy regularization orthe alternative methods.Pseudo labeling is related to self-training and expectation max-imization. In self-training, a classifier is trained on labeled data,predicts labels of unlabeled data, and is re-trained on the previously-unlabeled data. This process is then repeated [59]. Self-training canbe shown to be equivalent to a particular classification expectationmaximization algorithm [2, 19]. Pseudo labeling is almost the same,except that it is trained simultaneously on labeled and unlabeleddata [25]. In domain adaptation, we have two separate domains, sowe may instead wish to use the pseudo-labeled target data to traina separate target classifier [41]. This could be done in either one ortwo steps (similar to pseudo labeling or self-training, respectively).Pseudo labeling is also related to co-training. Co-training issimilar to self-training but utilizes two classifiers for two separateviews of the data. Pseudo-labeled samples are selected in whichexactly one of the classifiers is confident, which are then added tothe labeled training set for training in subsequent iterations [8].When only one view is available as is common in domain adaptationproblems, Chen et al. [8] propose feature splits to artificially createtwo views.Finally, the proposed method of using a discriminator or domainclassifier to select which samples to use for adaptation is relatedto selection adaptation, a type of instance weighting [3, 12]. Inselection adaptation, a domain classifier learns to predict whichdomain the samples are from. The labeled source data weighted bya function of these domain predictions are used to train a targetclassifier [12]. While related, this differs from our proposed methodin several ways. First, we do not weight the source data but ratherpseudo label and weight the target data. Second, our discriminatoroperates at a feature level rather than a sample level. Third, ourdiscriminator is trained jointly rather than in stages (more commonin deep methods). We compare several alternative approaches for domain adaptation.In the first approach, no adaptation is performed. In the secondmethod, we use DANN to learn a domain-invariant feature repre-sentation. The third approach employs pseudo labeling and weights ulti-Purposing Domain Adaptation Discriminators for Pseudo Labeling ConfidenceAdvML’19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD, August 5th, 2019, Anchorage,Alaska, USA instances either by the task classifier’s softmax confidence or by adiscriminator’s confidence (our proposed method).

We train a feature extractor followed by a task classifier on thelabeled source data only. Then we evaluate this model on the targetdata to see how well it generalizes without performing any domainadaptation. We expect this method to perform poorly when largedifferences exist between domains.

We train a feature extractor, softmax task classifier, and binary do-main classifier as shown in Figure 1. The training consists of threeweight updates at each iteration: (1) the feature extractor and taskclassifier together are trained to correctly classify labeled sourcedata (e.g., with categorical cross entropy loss), (2) the domain clas-sifier is trained to correctly label from which domain’s data thefeature representation originated, and (3) the feature extractor istrained to fool the domain classifier. Through this process, the fea-ture extractor learns to produce domain-invariant representations.Rather than using a gradient reversal layer [15] for steps (2) and(3), we choose to perform a GAN-like update as used by Shu et al.[47]. For a discriminator D , a feature extractor F , source domaindata D s , and target domain data D t , these two updates can beperformed by minimizing:min D − E x ∼D s [ log D ( F ( x ))] − E x ∼D t [ log ( − D ( F ( x )))] (1)min F − E x ∼D t [ log D ( F ( x ))] − E x ∼D s [ log ( − D ( F ( x )))] (2)Step (2) becomes Equation 1, updating the discriminator to cor-rectly classify the feature representations of source and target dataas “source” and “target”. Step (3) becomes Equation 2, updating thefeature extractor to fool the discriminator by classifying sourcedata as “target” and target data as “source”. The losses for these twoupdates can be computed with binary cross entropy. Pseudo labeling can be added to DANN using the following steps.First, perform updates to the feature extractor, task classifier, anddomain classifier on a batch of source and target data as in DANN(Figure 1a). Second, pseudo label a batch of target data using the task classifier and record the domain classifier predictions withoutupdating the model (Figure 2a). Third, train a target classifier onthis pseudo-labeled target data but weighted by the probabilitythat the feature representations were generated from source data(Figure 2b). If the domain classifier is a binary classifier where 0is “source” and 1 is “target”, this probability can be calculated as1 − D ( F ( x )) . This contrasts with weighting by the task classifier’smax softmax output probability as a measure of confidence. Bothmethods of weighting are evaluated in the experiments. Alternatively, we can replace pseudo labeling with instance weight-ing. In this case we train the target classifier on source data weightedby how target-like the feature representation appears, given by (unlabeled)targetdata FeatureExtractor TaskClassifierDomainClassifier pseudolabel“source”or“target” (a) Training: pseudo label step (no weight update) (unlabeled)targetdata FeatureExtractor TargetClassifier (pseudo-labeled)class label (b) Training: update target classifier step (unlabeled)targetdata FeatureExtractor TargetClassifier classlabel (c) Testing

Figure 2: After each DANN training step (Figure 1a), targetdata is (a) pseudo labeled, and then (b) a target classifier istrained on those pseudo labels but weighted by the discrimi-nator’s predictions from (a), representing the probability thefeature representation was generated from source data (i.e.,the discriminator was fooled). All feature extractors shareweights. At test time, the feature extractor and target classi-fier are used for making predictions. Note that DANN usesa task classifier whereas the above target classifier is a sep-arate classifier and is only trained on pseudo-labeled targetdata. D ( F ( x )) . As in pseudo labeling, this weighting contrasts with weight-ing by the task classifier’s softmax confidence. At test time, as inpseudo labeling, we can use the feature extractor followed by thetarget classifier for predictions. Note that this method is essentiallyselection adaptation [12] but trained jointly and performed at afeature-level representation rather than the sample level. We evaluate the method variations of no adaptation, DANN, in-stance weighting evaluated on the task or target classifiers (Instance-TaskC and Instance), pseudo labeling without the adversarial stepthat produces a domain-invariant feature representation (Pseudo-NoAdv), and pseudo labeling evaluated on the task and target clas-sifiers (Pseudo-TaskC and Pseudo). Each instance weighting andpseudo labeling method is trained and evaluated both for weight-ing by the task classifier’s softmax confidence (task) and by thediscriminators confidence (domain).We train these methods on popular computer vision datasets:MNIST [23], USPS [24], SVHN [34], MNIST-M [15], SynNumbers[15], SynSigns [31], and GTSRB [48]. For MNIST ↔ USPS, we upscaleUSPS to 28x28 pixels to match MNIST using bilinear interpolation. dvML’19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD, August 5th, 2019, Anchorage,Alaska, USA Garrett Wilson and Diane J. Cook

Table 1: Classification accuracy (source → target) of the methods on benchmark computer vision datasets: MNIST, USPS, SVHN,MNIST-M, SynNumbers, SynSigns, and GTSRB. The strongest task vs. domain confidence on each dataset for each method ishighlighted in bold (if not the same). The best-performing method in each column is underlined. The last method (italicized)is the one we propose in this paper. This method performs best on average. Method MN → US US → MN SV → MN MN → MN-M Syn N → SV Syn S → GTSRB AverageNo Adaptation 0.888 0.869 0.797 0.246 0.827 0.954 0.764DANN 0.961 0.965 0.855 0.949 0.881 0.932 0.924Instance (task)

Instance (domain) 0.937 0.958

Pseudo-TaskC (task)

Pseudo (task)

Pseudo (domain)

For MNIST → MNIST-M, we pad MNIST with zeros (before normal-ization) to be 32x32 pixels and convert to RGB to match MNIST-M.For SVHN → MNIST, we pad MNIST to 32x32 and convert to RGBto match SVHN. The other datasets already have matching imagesizes and depths.For all experiments, we use the small CNN model used by Shu etal. [47] and for pseudo labeling use the task classifier architecture forthe target classifier. We train each model 80,000 steps with Adamusing a learning rate of 0.001 [47], a batch size of 128 [15], theadversarial learning rate schedule from DANN [15], and a learningrate of 0.0005 for the target classifier. Target and source domaindata is fed through the model in separate batches allowing fordomain-specific batch statistics [13, 26]. For model selection, we use1000 labeled target samples from the training datasets as a holdoutvalidation set. The reported accuracies are for the evaluation ofeach model (the target classifier for pseudo labeling and instanceweighting methods, otherwise task classifier) on the testing setsthat performed best on the holdout validation set. Thus, since intruly “unsupervised” domain adaptation situations we would nothave any labeled target data, these results can be interpreted as anupper bound for how well these methods can perform [53]. Usingsome labeled target examples in this way is a common approachfor tuning domain adaptation methods [5, 6, 21, 39, 47, 53, 55].The results are summarized in Table 1. As indicated in theseresults, the proposed method of pseudo labeling with domain confi-dence performs the best. We can see that in all cases at least one ofthe adaptation methods improves over no adaptation, at least one ofthe pseudo labeling methods improves over instance weighting, andat least one of the pseudo labeling methods improves over DANN.On half of the datasets and on average, using adversarial trainingwith pseudo labeling improves results. Typically, evaluating pseudolabeling on the target classifier is more effective than on the taskclassifier; though interestingly Pseudo-TaskC almost always out-performs DANN despite the task classifier never being updated bythe pseudo labeling process. This indicates that pseudo labeling can improve the feature representation for the target domain. Finally,our primary goal was to determine if using a discriminator’s confi-dence is more effective than a task classifier’s softmax confidence,which is true on average for each of the pseudo labeling methodsthough not for instance weighting. Thus, these experiments appearto provide evidence that pseudo labeling with a domain discrimina-tor’s confidence may yield an improvement over a task classifier’ssoftmax confidence.

In this paper, we investigated how to weight samples for pseudolabeling. We proposed using a discriminator’s confidence ratherthan a task classifier’s softmax confidence. The results of testingthese methods on computer vision datasets provides insight intothe possible benefit of using a discriminator not only for producinga domain-invariant feature representation but also for weightingsamples for pseudo labeling.Future work includes hyperparameter tuning either on the hold-out set or with a method that does not require any labeled targetdata such as reverse validation [15]. This method should be testedon additional datasets such as Office-31 [40] and on a greater va-riety of domain adaptation tasks. Additionally, we can determinewhether confidence thresholding [13] improves over confidenceweighting. Finally, we can investigate theory behind selecting orweighting samples for pseudo labeling or instance weighting thatmay indicate why or when this method will work in addition topossible tweaks to yield improvements, such as possibly using afunction of the discriminator’s output rather than the probabilitydirectly, as is done in selection adaptation [12].

REFERENCES [1] Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, andMario Marchand. 2014. Domain-adversarial neural networks. arXiv preprintarXiv:1412.4446 (2014).[2] Massih-Reza Amini and Patrick Gallinari. 2002. Semi-supervised logistic regres-sion. In

ECAI . 390–394. ulti-Purposing Domain Adaptation Discriminators for Pseudo Labeling ConfidenceAdvML’19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD, August 5th, 2019, Anchorage,Alaska, USA [3] Oscar Beijbom. 2012. Domain adaptations for computer vision applications. arXivpreprint arXiv:1211.4860 (2012).[4] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, andJennifer Wortman Vaughan. 2010. A theory of learning from different domains.

Machine Learning

79, 1 (01 May 2010), 151–175. https://doi.org/10.1007/s10994-009-5152-4[5] Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman, Dilip Krishnan,and Dumitru Erhan. 2016. Domain Separation Networks. In

Advances in NeuralInformation Processing Systems 29 , D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon,and R. Garnett (Eds.). Curran Associates, Inc., 343–351. http://papers.nips.cc/paper/6254-domain-separation-networks.pdf[6] Fabio Maria Carlucci, Lorenzo Porzi, Barbara Caputo, Elisa Ricci, and Samuel RotaBulò. 2017. Autodial: Automatic domain alignment layers. In . IEEE, 5077–5085.[7] Olivier Chapelle and Alexander Zien. 2005. Semi-supervised classification bylow density separation.. In

AISTATS , Vol. 2005. Citeseer, 57–64.[8] Minmin Chen, Kilian Q Weinberger, and John Blitzer. 2011. Co-Training forDomain Adaptation. In

Advances in Neural Information Processing Systems 24 ,J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger(Eds.). Curran Associates, Inc., 2456–2464. http://papers.nips.cc/paper/4433-co-training-for-domain-adaptation.pdf[9] Nicolas Courty, Rémi Flamary, Amaury Habrard, and Alain Rakotomamonjy. 2017.Joint distribution optimal transportation for domain adaptation. In

Advancesin Neural Information Processing Systems 30 , I. Guyon, U. V. Luxburg, S. Bengio,H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates,Inc., 3730–3739. http://papers.nips.cc/paper/6963-joint-distribution-optimal-transportation-for-domain-adaptation.pdf[10] Bharath Bhushan Damodaran, Benjamin Kellenberger, Rémi Flamary, Devis Tuia,and Nicolas Courty. 2018. DeepJDOT: Deep Joint Distribution Optimal Transportfor Unsupervised Domain Adaptation. In

Computer Vision – ECCV 2018 , VittorioFerrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). SpringerInternational Publishing, Cham, 467–483.[11] Debasmit Das and C. S. George Lee. 2018. Graph Matching and Pseudo-LabelGuided Deep Unsupervised Domain Adaptation. In

Artificial Neural Networks andMachine Learning – ICANN 2018 , Věra Kůrková, Yannis Manolopoulos, BarbaraHammer, Lazaros Iliadis, and Ilias Maglogiannis (Eds.). Springer InternationalPublishing, Cham, 342–352.[12] Hal Daumé III. 2012. A course in machine learning.

Publisher, ciml. info

International Conference on Learning Representations .https://openreview.net/forum?id=rkpoTaxA-[14] Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised Domain Adaptation byBackpropagation. In

Proceedings of the 32nd International Conference on MachineLearning (Proceedings of Machine Learning Research) , Francis Bach and David Blei(Eds.), Vol. 37. PMLR, 1180–1189. http://proceedings.mlr.press/v37/ganin15.html[15] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, HugoLarochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. 2016.Domain-Adversarial Training of Neural Networks.

Journal of Machine LearningResearch

17, 59 (2016), 1–35. http://jmlr.org/papers/v17/15-239.html[16] Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie Zhang, David Balduzzi, andWen Li. 2016. Deep Reconstruction-Classification Networks for UnsupervisedDomain Adaptation. In

Computer Vision – ECCV 2016 , Bastian Leibe, Jiri Matas,Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham,597–613.[17] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.

Deep learning . MITpress.[18] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative AdversarialNets. In

Advances in Neural Information Processing Systems 27 , Z. Ghahramani,M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Asso-ciates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf[19] Yves Grandvalet and Yoshua Bengio. 2005. Semi-supervised Learning by EntropyMinimization. In

Advances in Neural Information Processing Systems 17 , L. K. Saul,Y. Weiss, and L. Bottou (Eds.). MIT Press, 529–536. http://papers.nips.cc/paper/2740-semi-supervised-learning-by-entropy-minimization.pdf[20] Guoliang Kang, Lu Jiang, Yi Yang, and Alexander G Hauptmann. 2019. Con-trastive Adaptation Network for Unsupervised Domain Adaptation. arXiv preprintarXiv:1901.00976 (2019).[21] Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Roge-rio Feris, Bill Freeman, and Gregory Wornell. 2018. Co-regularized Alignment forUnsupervised Domain Adaptation. In

Advances in Neural Information ProcessingSystems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, andR. Garnett (Eds.). Curran Associates, Inc., 9345–9356. http://papers.nips.cc/paper/8146-co-regularized-alignment-for-unsupervised-domain-adaptation.pdf[22] Samuli Laine and Timo Aila. 2017. Temporal Ensembling for Semi-SupervisedLearning. In

International Conference on Learning Representations . https: //openreview.net/forum?id=BJ6oOfqge[23] Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. 1998. The MNISTdatabase of handwritten digits. Retrieved August 16, 2018 from http://yann.lecun.com/exdb/mnist/[24] Y. LeCun, O. Matan, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W.Hubbard, L. D. Jacket, and H. S. Baird. 1990. Handwritten zip code recognitionwith multilayer networks. In [1990] Proceedings. 10th International Conference onPattern Recognition , Vol. ii. 35–40 vol.2. https://doi.org/10.1109/ICPR.1990.119325[25] Dong-Hyun Lee. 2013. Pseudo-label: The simple and efficient semi-supervisedlearning method for deep neural networks. In

Workshop on Challenges in Repre-sentation Learning, ICML , Vol. 3. 2.[26] Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. 2018. Adap-tive Batch Normalization for practical domain adaptation.

Pattern Recognition

Proceedings of the 32ndInternational Conference on Machine Learning (Proceedings of Machine LearningResearch) , Francis Bach and David Blei (Eds.), Vol. 37. PMLR, 97–105. http://proceedings.mlr.press/v37/long15.html[28] Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Michael I Jordan. 2018.Conditional Adversarial Domain Adaptation. In

Advances in Neural InformationProcessing Systems 31 , S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 1640–1650. http://papers.nips.cc/paper/7436-conditional-adversarial-domain-adaptation.pdf[29] Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. 2017. DeepTransfer Learning with Joint Adaptation Networks. In

Proceedings of the 34thInternational Conference on Machine Learning (Proceedings of Machine LearningResearch) , Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, InternationalConvention Centre, Sydney, Australia, 2208–2217. http://proceedings.mlr.press/v70/long17a.html[30] T. Miyato, S. Maeda, S. Ishii, and M. Koyama. 2018. Virtual Adversarial Training:A Regularization Method for Supervised and Semi-Supervised Learning.

IEEETransactions on Pattern Analysis and Machine Intelligence (2018), 1–1. https://doi.org/10.1109/TPAMI.2018.2858821[31] Boris Moiseev, Artem Konev, Alexander Chigorin, and Anton Konushin. 2013.Evaluation of Traffic Sign Recognition Methods Trained on Synthetically Gener-ated Data. In

Advanced Concepts for Intelligent Vision Systems , Jacques Blanc-Talon,Andrzej Kasinski, Wilfried Philips, Dan Popescu, and Paul Scheunders (Eds.).Springer International Publishing, Cham, 576–583.[32] Pietro Morerio, Jacopo Cavazza, and Vittorio Murino. 2018. Minimal-EntropyCorrelation Alignment for Unsupervised Deep Domain Adaptation. In

Interna-tional Conference on Learning Representations . https://openreview.net/forum?id=rJWechg0Z[33] Pietro Morerio and Vittorio Murino. 2017. Correlation Alignment by RiemannianMetric for Domain Adaptation. arXiv preprint arXiv:1705.08180 (2017).[34] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and An-drew Y Ng. 2011. Reading digits in natural images with unsupervised featurelearning. In

NIPS workshop on deep learning and unsupervised feature learning ,Vol. 2011. 5.[35] Avital Oliver, Augustus Odena, Colin A Raffel, Ekin Dogus Cubuk, and Ian Good-fellow. 2018. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms.In

Advances in Neural Information Processing Systems 31 , S. Bengio, H. Wallach,H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Asso-ciates, Inc., 3235–3246. http://papers.nips.cc/paper/7585-realistic-evaluation-of-deep-semi-supervised-learning-algorithms.pdf[36] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning.

IEEETransactions on Knowledge and Data Engineering

22, 10 (Oct 2010), 1345–1359.https://doi.org/10.1109/TKDE.2009.191[37] Sanjay Purushotham, Wilka Carvalho, Tanachat Nilanon, and Yan Liu. 2017. Vari-ational adversarial deep domain adaptation for health care time series analysis.In

International Conference on Learning Representations . https://openreview.net/forum?id=rk9eAFcxg[38] A. Rozantsev, M. Salzmann, and P. Fua. 2019. Beyond Sharing Weights forDeep Domain Adaptation.

IEEE Transactions on Pattern Analysis and MachineIntelligence

41, 4 (April 2019), 801–814. https://doi.org/10.1109/TPAMI.2018.2814042[39] Paolo Russo, Fabio M. Carlucci, Tatiana Tommasi, and Barbara Caputo. 2018.From Source to Target and Back: Symmetric Bi-Directional Adaptive GAN. In

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .[40] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting VisualCategory Models to New Domains. In

Computer Vision – ECCV 2010 , KostasDaniilidis, Petros Maragos, and Nikos Paragios (Eds.). Springer Berlin Heidelberg,Berlin, Heidelberg, 213–226.[41] Kuniaki Saito, Yoshitaka Ushiku, and Tatsuya Harada. 2017. Asymmetric Tri-training for Unsupervised Domain Adaptation. In

Proceedings of the 34th In-ternational Conference on Machine Learning (Proceedings of Machine LearningResearch) , Doina Precup and Yee Whye Teh (Eds.), Vol. 70. PMLR, 2988–2997.http://proceedings.mlr.press/v70/saito17a.html dvML’19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD, August 5th, 2019, Anchorage,Alaska, USA Garrett Wilson and Diane J. Cook [42] Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate Saenko. 2018. Adver-sarial Dropout Regularization. In

International Conference on Learning Represen-tations . https://openreview.net/forum?id=HJIoJWZCZ[43] Swami Sankaranarayanan, Yogesh Balaji, Carlos D. Castillo, and Rama Chel-lappa. 2018. Generate to Adapt: Aligning Domains Using Generative AdversarialNetworks. In

The IEEE Conference on Computer Vision and Pattern Recognition(CVPR) .[44] Swami Sankaranarayanan, Yogesh Balaji, Arpit Jain, Ser Nam Lim, and RamaChellappa. 2018. Learning From Synthetic Data: Addressing Domain Shift forSemantic Segmentation. In

The IEEE Conference on Computer Vision and PatternRecognition (CVPR) .[45] Ozan Sener, Hyun Oh Song, Ashutosh Saxena, and Silvio Savarese. 2016. Learn-ing Transferrable Representations for Unsupervised Domain Adaptation. In

Advances in Neural Information Processing Systems 29 , D. D. Lee, M. Sugiyama,U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 2110–2118. http://papers.nips.cc/paper/6360-learning-transferrable-representations-for-unsupervised-domain-adaptation.pdf[46] Jian Shen, Yanru Qu, Weinan Zhang, and Yong Yu. 2018. Wasserstein DistanceGuided Representation Learning for Domain Adaptation. In

Thirty-Second AAAIConference on Artificial Intelligence .[47] Rui Shu, Hung Bui, Hirokazu Narui, and Stefano Ermon. 2018. A DIRT-T Ap-proach to Unsupervised Domain Adaptation. In

International Conference on Learn-ing Representations . https://openreview.net/forum?id=H1q-TM-AW[48] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. 2011. The German Traffic SignRecognition Benchmark: A multi-class classification competition. In

The 2011International Joint Conference on Neural Networks . 1453–1460. https://doi.org/10.1109/IJCNN.2011.6033395[49] Baochen Sun and Kate Saenko. 2016. Deep CORAL: Correlation Alignment forDeep Domain Adaptation. In

Computer Vision – ECCV 2016 Workshops , GangHua and Hervé Jégou (Eds.). Springer International Publishing, Cham, 443–450.[50] Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models:Weight-averaged consistency targets improve semi-supervised deep learning results. In

Advances in Neural Information Processing Systems 30 , I. Guyon, U. V.Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett(Eds.). Curran Associates, Inc., 1195–1204. http://papers.nips.cc/paper/6719-mean-teachers-are-better-role-models-weight-averaged-consistency-targets-improve-semi-supervised-deep-learning-results.pdf[51] Eric Tzeng, Judy Hoffman, Trevor Darrell, and Kate Saenko. 2015. SimultaneousDeep Transfer Across Domains and Tasks. In

The IEEE International Conferenceon Computer Vision (ICCV) .[52] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. 2017. AdversarialDiscriminative Domain Adaptation. In

The IEEE Conference on Computer Visionand Pattern Recognition (CVPR) .[53] Jiawei Wang, Zhaoshui He, Chengjian Feng, Zhouping Zhu, Qinzhuang Lin,Jun Lv, and Shengli Xie. 2018. Domain Confusion with Self Ensembling forUnsupervised Adaptation. arXiv preprint arXiv:1810.04472 (2018).[54] Yifei Wang, Wen Li, Dengxin Dai, and Luc Van Gool. 2017. Deep Domain Adap-tation by Geodesic Distance Minimization. In

The IEEE International Conferenceon Computer Vision (ICCV) Workshops .[55] Kai-Ya Wei and Chiou-Ting Hsu. 2018. Generative Adversarial Guided Learningfor Domain Adaptation.

British Machine Vision Conference (2018).[56] Garrett Wilson and Diane J. Cook. 2019. A Survey of Unsupervised Deep DomainAdaptation. arXiv preprint arXiv:1812.02849 (2019).[57] Y. Zhang, N. Wang, S. Cai, and L. Song. 2018. Unsupervised Domain Adaptationby Mapped Correlation Alignment.

IEEE Access arXiv preprintarXiv:1901.09453 (2019).[59] Xiaojin Jerry Zhu. 2005.

Semi-supervised learning literature survey . TechnicalReport. University of Wisconsin-Madison Department of Computer Sciences.[60] Yang Zou, Zhiding Yu, B.V.K. Vijaya Kumar, and Jinsong Wang. 2018. Unsu-pervised Domain Adaptation for Semantic Segmentation via Class-BalancedSelf-Training. In