Cross-domain attribute representation based on convolutional neural network
Guohui Zhang, Gaoyuan Liang, Fang Su, Fanxin Qu, Jing-Yan Wang
CCross-domain attribute representation based on convolutional neural network * Guohui Zhang , Gaoyuan Liang , Fang Su , Fanxin Qu , and Jing-Yan Wang Huawei Technologies Co., Ltd., Shanghai, China [email protected] Jiangsu University of Technology, Jiangsu 213001, China [email protected] Shaanxi University of Science & Technology, Xi’an, China Northwestern Polytechnical University, Xi’an, China Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, China [email protected]
Abstract.
In the problem of domain transfer learning, we learn a model for the prediction in a target domain from the data of both some source domains and the target domain, where the target domain is in lack of labels while the source do-main has sufficient labels. Besides the instances of the data, recently the attributes of data shared across domains are also explored and proven to be very helpful to leverage the information of different domains. In this paper, we propose a novel learning framework for domain-transfer learning based on both instances and at-tributes. We proposed to embed the attributes of different domains by a shared convolutional neural network (CNN), learn a domain-independent CNN model to represent the information shared by different domains by matching across do-mains, and a domain-specific CNN model to represent the information of each domain. The concatenation of the three CNN model outputs is used to predict the class label. An iterative algorithm based on gradient descent method is developed to learn the parameters of the model. The experiments over benchmark datasets show the advantage of the proposed model.
Keywords:
Convolutional Neural Network, Domain-Transfer Learning, Attrib-ute Embedding. Introduction
In the machine learning problems, domain transfer learning has recently attracted much attention [23,26]. Transfer learning refers to the learning problem of a predictive model for a target domain, by leveraging the data from both the target domain and one or more * The study was supported by Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, China (Grant No. KJS1324). auxiliary domains. One shortage of traditional transfer learning methods is that the at-tributes of the data are not used by the classification model. But the attributes of the data actually has the nature of stability across the domains. Thus using the attributes of the data is critical for the transfer learning [21,19]. Peng et al. [19] proposed to represent the attribute vectors of each data point by using an attribute dictionary. Each data point is reconstructed by the elements in the dictionary, and the reconstruction coefficients are used as the new representation of the attributes. The attribute vector of a data point is mapped to the new representation vector by a linear transformation matrix so that the new representation vector is linked to the attribute vector. To leverage the auxiliary and the target domains, the same attribute representation method is applied to both auxiliary and target domains. The learning process is regularized by the class-intra similarity in the auxiliary domains, and by the neighborhood in the target domain. Su et al. [21] proposed a low-rank attribute embedding method for the problem of person re-identi-fication of multiple cameras. The proposed method tries to solve the problem of multi-ple cameras based person re-identification as a multi-task learning problem. The pro-posed method uses both the low-level features with mid-level attributes as the input of the identification model. The embedding of attributes maps the attributes to a continu-ous space to explore the correlative relationship between each pair of attributes and also recovers the missing attributes. Both these two methods of attribute representation are based on the linear transformation. However, a simple linear function may be insuffi-cient to represent the attributes effectively. In this paper, we propose a novel attribute embedding method for attributes for the problem of domain transfer learning. The embedding of attributes is based on convolu-tional neural network (CNN) model [25,12,13]. The convolutional output of the input data is further mapped to the attribute vector. In this way, the attribute embedding vec-tor not only represents the attributes of a data point but also contains the pattern of the input data constructed by the CNN model, which has been proven to be a powerful representation model. To construct the classification model for each domain, we also learn a domain independent convolutional representation and a domain-specific convo-lutional representation. The domain-independent convolutional representation maps the data of different domains to a shared data space to capture the patterns shared over all the domain. The domain-specific convolutional representation is used to represent the patterns specifically contained by each domain. The classification model of each do-main is based on the three types of convolutional representations, i.e., attribute embed-ding, domain-independent and domain-specific representations. To learn the parame-ters of the models, we propose to minimize the mapping errors of the attributes, the classification errors across different domains, the mismatching of different domains in the domain-independent representation space, and the dissimilarity between the neigh-boring data points in the target domain. The joint minimization problem is solved by an alternate optimization strategy and the gradient descent algorithm. Method
In the problem setting of cross-domain learning, we assume we have T domains. The first T − 1 domain are the auxiliary domains, while the T-th domain is the target domain. The problem is to learn an effective model for the classification of the target domain. The input data sets of the T domains are denoted as | Tt t X , where , , ){( } i t tt t i i X X a y is the data set of the t-th domain. i | |1 | | X [ , , ] tti d Xt t ti i i X x x R is the input matrix of the i-th data point of the t-th do-main, and each column of the matrix is a feature vector of a instance, and | | a {1, 0} t ai is its binary attribute vector, and | | y {1, 0} t yi is its class label vector. We propose to embed the attributes of each input data point to a vector and use the convolutional representation of the input data as the embedding vector. Given the the input matrix of a data point, X, we represent it by a CNN model composed of a convo-lutional layer, a activation layer, and max-pooling layer, denoted as a f X . Since this convolutional representation of X is used as its attribute vector embedding, we propose to map it to the attribute vector a by a linear mapping function, a f (X) T a (1) where Θ ∈ R |a|×m is the mapping matrix. To reduce the mapping errors, we proposed to minimize the Frobenius norm distance between the convolutional representations and the mapping results for all the data points of all domains,
2, 1 1 min ( ) ta nT t ta i i Ff t i f X a (2) To predict the class labels for the data points in multiple domains, we proposed the data of each domain to represent the data into a domain-independent convolu-tional representation and a domain-specific convolutional representation. The do-main-independent convolutional representation function is shared across all the do-mains. It tries to extract features relevant to the class labels, but independent of the specific domains. The domain-independent convolutional recreation function is also based on a CNN model, denoted f (X). Since the f (X) outputs are domain-independ-ent, we hope the representations of data points from different domains can be similar to each other. To this end, we impose that the distribution of the base representations of different domains is of the same. We use the mean vector of the representations of each domain as the presentation of the distribution of the domain. For the t-th domain, the mean vector is given as t n tiit f X . To reduce the mismatch among the domains, we proposed to minimize the Frobenius norm distances between the mean vectors of each pair of domains, '0 t t n nT t ti if t t t t i it t F f X f Xn n (3) To predict the class labels for the data points of different domains, we also con-sider the representation of the data points according to the domains. This is the do-main-specific representation. The representation is also based on CNN models, and the CNN model of the t-th domain of a data point X is denoted as f t (X). To estimate the class label from a data point of the t-th domain, we concatenate both the domain-independent and domain-specific convolutional representations of the input data, f (X) and f t (X), and also the attribute embedding of the data, f a (X). They are concatenated to a longer vector, ( )f ( ) ( )( ) t a m mta f XX f X Rf X (4) and the longer vector is transformed to a |y|-dimensional vector of scores of classifica-tion by a matrix U t a m m yta UU RU in a classification function, t 0 0 ( ) ( ) ( ) ( ) ( )
T T T Tt t a a h X U f X U f X U f X U f X ,t = 1,··· ,T, where U , U t , and U a are the transformation matrices for the domain-independent representation, domain-specific representation, and the attribute embedding. To learn the parameters of the model, we propose the following minimization, min ( , | , , , ) ( )+ ( ) ( )1 1+C ( ) ( )( ) ( tTt t t a a tT t nTT t tt t t a a i t i FU W U W t il TT T t ti t i a i iF FtnT t ti it t t t it t FTii i i o U W U W y h Xy h X C f X af X f Xn nC M f x f X
2, ' 1 ) T n T Fi i (5) We explain the objective terms of the objective function o as follows, ─ The classification function is used to predict the class labels, thus we propose to reduce the prediction errors measured by the Frobenius norm distance between the class label vectors and the outputs of ht(X) for the data pints with available label vectors. The first two terms are the classificaiton error terms. ─
The third term is attribute mapping error term of (2). ─
The fourth term is the cross-domain matching term of the domain-independent CNN model of (3). ─
For the unlabeled data points in the target domain, we also regularize them by imposing their representations to be constant with the labeled data points in the neighborhood, so that the supervision information can also be propagated to them. To this end, we hope for any neighboring two data points in the target domain, their overall representation vectors are close to each other. To this end, in the last term, we minimize the Frobenius norm distance between the representations of neighbor-ing data points in the target domain, where M ii′ = 1 if Xiand X i′ are a neighbor to each other and 0 otherwise. C k ,k = 1,··· ,3 are the tradeoff weights of different regularization terms. To solve this problem, we proposed to use the sub-gradient descent algorithm. The filters of the CNN models, the mapping matrix, and the transformation matrices are updated ac-cording to the direction of gradient function of the objective, , o (6) where { , , , , , , , , , , }, a T a T f f f f U U U U τ is the descent step, and ∇ Φ o is the sub-gradient of o regarding Φ. In an iterative, the parameters of Φ are updated alternately until a maximum iteration number is reached or converge is achieved. Experiment
In this section, we evaluate the proposed method over several domain-transfer prob-lems.
Data sets and experimental setting
In the experiments, we use three datasets as follows. CUHK03 data set was devel-oped for the problem of person re-identification problems [15]. It contains 13,164 im-ages of 1,360 persons. For each image, we annotate it by 108 attributes, including gen-der (male/female), wearing long hair, etc. The images are captured by six different cam-eras. The problem of person re-identification is to train a classifier over the images of some cameras, and then use the classifier to identify an image captured from other cam-eras. We treat each camera as a domain, and we use each domain as a target domain in turn. Bankrupt prediction data contains the stock price wave data of 3 years of 374 companies of three different countries, China, USA, and UK. We collected this data for the problem of prediction of company bankrupt. Each company is also labeled by a list of business type attributes. Each company is treated as a data points, presented by a set of short-term frames, and a list of binary attributes of business types. Moreover, each country is treated as domain. The prediction problem of this data set is to predict if a given company will be in bankrupt within the future 3 years. Spam email data set is for the spam email detection competition of the ECML/PKDD Discovery Challenge 2006 [3]. It contains texts of emails of 15 email users, and for each user, there are 400 emails. Among the 400 emails of each user, half of them are spam emails, while the remaining half are non-spam emails. Each email text is composed of a set of words. Moreover, we also apply a topic classifier and a sentiment classifier to each email text to extract at-tributes of the text and use the extracted attributes as additional information. Each user is treated as a domain, and we also use each user as a target domain in turn. In our experiments, given a data set of several domains, we treat each domain as a target do-main in turn, while treating the other domains as the auxiliary domains to help train the model. The data points in a target domain are further split into a training set and a test set with equal sizes randomly. Meanwhile, for the training set of the target domain, we further split it into equal-sized subsets. One subset is used as a labeled set, and the other set is used as an unlabeled set. We train the model over the data points of the auxiliary domains and the training set of the target domain and then test it over the test set of the target domain. The classification rate over the test set is used as the performance meas-ure. The average classification rate over different target domains is reported and com-pared.
Results
In the experiments, we first compare the proposed domain transfer convolutional attrib-ute embedding (DTCAE) algorithm to some state-of-the-art domain-transfer attribute representation methods, and then study the properties of the proposed algorithm exper-imentally.
Fig.1 . Comparison results over the benchmark data sets.
Attribute embedding for domain-transfer learning problem is a new topic and there are only two existing methods. In the experiment, we compare the proposed algorithm against the two existing methods, which are the Joint Semantic and Latent Attribute Modelling (JSLAM) method proposed by Peng et al. [19], and the Multi-Task Learning with Low-Rank Attribute Embedding (MTL-LORAE) method proposed by Su et al. [21]. The comparison results over the three benchmark data sets are shown in Figure 1. According to the reported average accuracies over the benchmark datasets, our algo-rithm DTCAE achieves the best performance over all the three datasets. For example, over the CUHK03 data set, the DTCAE is the only compared method which has an average accuracy higher than 0.800. Meanwhile, over the spam email dataset, only DTCAE obtains an average accuracy higher than 0.900.
Fig.2 . Convergence curves over the benchmark data sets. Since the proposed algorithm DTCAE is an iterative algorithm. The variables are updated alternately. We are also interested in the convergence of the algorithm. Thus we plot the average classification rates with a different number of iterations. The curves over the three benchmark data sets are plotted in Figure 2. According to the curves of Figure 2, when more iterations are used to update the variables of the model, the aver-age classification rates increase stably. This is not surprising because a larger number of iterations reaches a smaller objective function. This verifies the effectiveness of the proposed model and its corresponding objective function. Moreover, we also observe that when the iteration number is larger than 100, the change of the performance is very small. This means that the algorithm converges and no more iteration is needed to im-prove the performance. Conclusion
In this paper, we propose a novel model for the problem of cross-domain learning prob-lem with attribute data. We use a CNN model to map the input data to its attributes.
Moreover, a domain-independent and domain-specific CNN model are also used to represent the data input itself. The attribute embedding, the domain-independent, and domain-specific representations are concatenated as the new representation of the data points, and we further a linear layer to map the new representation to the class labels. Moreover, we also impose the domain-independent representations of data points of different domains to be in a common distribution, and the neighboring data points of target domain to be similar to each other. We model the learning problem as a minimi-zation problem and solve it by an iterative algorithm. The experiments on three bench-mark data sets show its advantages. In the future, we plan to apply the proposed method other applications, such as biometrics [20,27], network analysis [24], human pose esti-mation [11,10,9], mobile computing [16,8], mathematics [17,2,1,18], etc. The similar approach can also be adopted in other related fields such as systems [6,7,14], and sys-tem security [4,5,22].
References Bai, C., Bellier, O., Guo, L., Ni, X.: Splitting of operations, manin products, and rota–baxter operators. International Mathematics Research Notices 2013(3), 485–524 (2013) 2.
Bai, C., Guo, L., Ni, X.: Generalizations of the classical yang-baxter equation and o-opera-tors. Journal of Mathematical Physics 52(6), 063515 (2011) 3.
Bickel, S.: Ecml-pkdd discovery challenge 2006 overview. In: ECML-PKDD Discovery Challenge Workshop. pp. 1–9 (2006) 4.
Chen, Y., Khandaker, M., Wang, Z.: Pinpointing vulnerabilities. In: Proceedings of the 12th ACM Asia Conference on Computer and Communications Security. pp. 334–345. ACM, Abu Dhabi, United Arab Emirates (2017) 5.
Chen, Y., Khandaker, M., Wang, Z.: Secure in-cache execution. In: International Sympo-sium on Research in Attacks, Intrusions, and Defenses. pp. 381–402. Springer (2017) 6.
Chen, Y., Wang, Z., Whalley, D., Lu, L.: Remix: On-demand live randomization. In: Pro-ceedings of the Sixth ACM Conference on Data and Application Security and Privacy. pp. 50–61. ACM, New Orelans, LA (2016) 7.
Chen, Y., Zhang, Y., Wang, Z., Xia, L., Bao, C., Wei, T.: Adaptive android kernel live patching. In: Proceedings of the 26th USENIX Security Symposium (USENIX Security 17). USENIX Association, Vancouver, BC (August 2017) 8.
Cui, P., Liu, H., He, J., Altintas, O., Vuyyuru, R., Rajan, D., Camp, J.: Leveraging diverse propagation and context for multi-modal vehicular applications. In: Wire-less Vehicular Communications (WiVeC), 2013 IEEE 5th International Symposium on. pp. 1–5. IEEE (2013) 9.
Ding, M., Fan, G.: Articulated gaussian kernel correlation for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Work-shops. pp. 57–64 (2015) 10.
Ding, M., Fan, G.: Generalized sum of gaussians for real-time human pose tracking from a single depth sensor. In: Applications of Computer Vision (WACV), 2015 IEEE Winter Con-ference on. pp. 47–54. IEEE (2015) 11.
Ding, M., Fan, G.: Articulated and generalized gaussian kernel correlation for human pose estimation. IEEE Transactions on Image Processing 25(2), 776–789 (2016) 12.
Geng, Y., Liang, R.Z., Li, W., Wang, J., Liang, G., Xu, C., Wang, J.Y.: Learning convolu-tional neural network to maximize pos@ top performance measure. In: ESANN 2017 - Pro-ceedings. pp. 589–594 (2016) 13.
Geng, Y., Zhang, G., Li, W., Gu, Y., Liang, R.Z., Liang, G., Wang, J., Wu, Y., Patil, N., Wang, J.Y.: A novel image tag completion method based on convolutional neural transfor-mation. In: International Conference on Artificial Neural Networks. pp. 539–546. Springer (2017) 14.
Jin, Y., Wang, T., Zhang, H., Zhang, Y., Zhao, J., Tong, R.: Localized quasi(bi) harmonic field and its applications. Journal of Advanced Mechanical Design, Systems, and Manufac-turing 11(4), JAMDSM0047–JAMDSM0047 (2017) 15.
Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 152–159 (2014) 16.
Liu, H., He, J., Cui, P., Camp, J., Rajan, D.: Astra: Application of sequential training to rate adaptation. In: Sensor, Mesh and Ad Hoc Communications and Networks (SECON), 2012 9th Annual IEEE Communications Society Conference on. pp. 443–451. IEEE (2012) 17.
Ni, X., Bai, C.: Prealternative algebras and prealternative bialgebras. Pacific journal of math-ematics 248(2), 355–391 (2010) 18.
Ni, X., Bai, C.: Pseudo-hessian lie algebras and l-dendriform bialgebras. Journal of Algebra 400, 273–289 (2014) 19.
Peng, P., Tian, Y., Xiang, T., Wang, Y., Pontil, M., Huang, T.: Joint semantic and latent attribute modelling for cross-class transfer learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017) 20.
Shao, H., Chen, S., Zhao, J.y., Cui, W.c., Yu, T.s.: Face recognition based on subset selection via metric learning on manifold. Frontiers of Information Technology & Electronic Engi-neering 16(12), 1046–1058 (2015) 21.
Su, C., Yang, F., Zhang, S., Tian, Q., Davis, L., Gao, W.: Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017) 22.
Wang, X., Chen, Y., Wang, Z., Qi, Y., Zhou, Y.: Secpod: a framework for virtualization-based security systems. In: Proceedings of the 2015 USENIX Annual Technical Conference. pp. 347–360 (2015) 23.
Yang, L., Zhang, J.: Automatic transfer learning for short text mining. Eurasip Journal on Wireless Communications and Networking 2017(1), 42 (2017) 24.
Yu, T., Yan, J., Zhao, J., Li, B.: Joint cuts and matching of partitions in one graph. arXiv preprint arXiv:1711.09584 (2017) 25.
Zhang, G., Liang, G., Li, W., Fang, J., Wang, J., Geng, Y., Wang, J.Y.: Learning convolu-tional ranking-score function by query preference regularization. In: International Confer-ence on Intelligent Data Engineering and Automated Learning. pp. 1–8. Springer (2017) 26.