Knowledge-Enhanced Attentive Learning for Answer Selection in Community Question Answering Systems
KKnowledge-Enhanced Attentive Learning for AnswerSelection in Community Question Answering Systems
Fengshi Jing and Qingpeng Zhang Abstract.
In the community question answering (CQA) system, theanswer selection task aims to identify the best answer for a specificquestion, and thus is playing a key role in enhancing the service qual-ity through recommending appropriate answers for new questions.Recent advances in CQA answer selection focus on enhancing theperformance by incorporating the community information, particu-larly the expertise (previous answers) and authority (position in thesocial network) of an answerer. However, existing approaches forincorporating such information are limited in (a) only consideringeither the expertise or the authority, but not both; (b) ignoring thedomain knowledge to differentiate topics of previous answers; and (c)simply using the authority information to adjust the similarity score,instead of fully utilizing it in the process of measuring the similaritybetween segments of the question and the answer. We propose theKnowledge-enhanced Attentive Answer Selection (KAAS) model,which enhances the performance through (a) considering both theexpertise and the authority of the answerer; (b) utilizing the human-labeled tags, the taxonomy of the tags, and the votes as the domainknowledge to infer the expertise of the answer; (c) using matrix de-composition of the social network (formed by following-relationship)to infer the authority of the answerer and incorporating such infor-mation in the process of evaluating the similarity between segments.Besides, for vertical community, we incorporate an external knowl-edge graph to capture more professional information for vertical CQAsystems. Then we adopt the attention mechanism to integrate theanalysis of the text of questions and answers and the aforementionedcommunity information. Experiments with both vertical and generalCQA sites demonstrate the superior performance of the proposedKAAS model.
Community question answering (CQA) systems can utilize the exper-tise of the community to provide timely and personalized service toWeb users, and thus has merged as a key information acquisition plat-form for both general (e.g. Quora and Zhihu ) and specific/verticaltopics (e.g. HealthTap and Stockoverflow ) [1, 2]. Since existingCQA sites have collected rich data of question answering pairs, it ispossible to recommend an existing answer to a newly posted question.This is particularly important for vertical community for professionalknowledge exchange and acquisition. For example, clinical doctors School of Data Science, City University of Hong Kong, Hong Kong SAR. School of Data Science, City University of Hong Kong, Hong Kong SAR.Email: [email protected]. *Corresponding author https://stackoverflow.com/ (those who write answers on HealthTap) cannot ensure their availabil-ity for 24 hours. If a patient posts a question in the middle of night,it is unrealistic to expect a doctor to answer it immediately. On theother hand, the question may have already been answered previously.Hence, recommending appropriate answers from the answer pool fornew questions can provide needed information to the patient in atimely manner. In addition, we usually have more questions, manyof which overlapping each other, than answers. These issues causethe question starvation problem, which requires effective answer se-lection models to enhance service quality through capitalizing theaccumulated question/answer pool [3].The answer selection [4] in CQA involves knowledge managementand machine learning techniques with the primary focus on naturallanguage processing (NLP) [5] and knowledge graphs (e.g. questionclassification [6] and measuring semantic similarity [7, 38]), becausesuch online communities usually do not reveal the identity and de-tailed demographic information of users [8, 9, 10, 11, 12]. Typically,a long short-term memory (LSTM) framework is employed to learnthe text representations and extract features [13, 14, 15, 16]. Attentionmechanism has emerged as a common framework for such task due toits capability to capture the interrelations between different segmentsof the question and the answer [17, 18, 19]. To further enhance the per-formance, recent advances in CQA answer selection [2, 20, 21, 22, 23,24] go beyond pure NLP by incorporating the community information,particularly the expertise (previous answers) and authority (position inthe social network) of an answerer. Existing methods mainly adopt atwo-phase approach, in which certain statistics are calculated and thenimported into the downstream answer selection task. For instance,the count of an answerer’s followers indicates his or her authority;the count of votes/agrees/thanks received by an answer indicates thequality of his or her previous answers [2, 23]. More recent studiesutilize the text and tags of an answerer’s previous answers to inferthe domain expertise of the answerer [20, 21, 22, 24] . Despite beingeffective, existing methods for incorporating community informationare limited:(a) They only consider either the expertise or the authority, butnot both. In practice, both types of information may contribute topredicting the quality and relevance of an answer.(b) Domain knowledge is not fully utilized to differentiate topics ofprevious answers. To be specific, previous studies [21, 22] depict theexpertise of an answerer through analyzing the textual information ofall the answerer’s previous answers. This approach is appropriate forvertical CQA communities since the answerers (e.g. a clinical doctor)are not likely to answer questions that are irrelevant to their expertise.However, this approach is limited for general CQA communities,in which users often answer various questions. If the full text ofan answerer’s previous answers is used, we may include irrelevant a r X i v : . [ c s . A I] D ec xpertise information. For example, a user’s previous answer aboutChinese history does not provide any information for a question abouta machine learning algorithm. In addition, using the full text of allprevious answers requires huge computational resources.(c) The expertise and authority information (e.g. the count of fol-lowers and the expertise of the followers) is only used to adjust thefinal similarity score (e.g. [22, 24]), instead of being fully utilizedin the process of measuring the similarity between segments of thequestion and the answer.More specifically, existing approaches assume that an answerer’sauthority can be represented by the linear combination of followers’specialties. The similarity between the question and an answereris be derived by a model first and then adjusted by the answerer’sauthority. This is problematic because different specialties are notentirely independent to each other; there could be interrelations amongtwo or more specialties. In addition, such expertise and authorityinformation could have been fully utilized in the NLP process thatevaluates the similarities among segments of questions and answerers. Embedding and biLSTM
QuestionAnswer Pooling PoolingAttentionQA Driven Interaction
Pooling
Attention
Similarity 𝒉 𝑎𝟏 𝒓 𝑎𝟏 𝑮 𝒓 𝑎 𝒉 𝑞𝟏 𝒓 𝑞 Community Driven CollaborationHistorical Answers Social NetworkTags, Votes, Interests, Topics
SVD 𝑈 𝑎 𝑈 𝑓 𝑉 𝑡 𝒓 𝑞𝟏 Vertical CQA KGKnowledge Graph 𝑈 𝑘𝑔 Tags
Figure 1.
Framework of the KAAS model.
To fill the aforementioned gaps, we have proposed the Knowledge-enhanced Attentive Answer Selection (KAAS) model (framework isshown by
Figure 1 ). First, we introduce an expertise matrix and anauthority matrix to capture the expertise information from historicalanswers and the authority information from the answerer’s follow-ers in the social networks, respectively. The two matrices share thesame (human-labeled) tag dimension, which represents the predefinedtopic/specialty structure. To address the sparsity problem caused bythe large number of tags, we utilize the taxonomy of tags to grouptags that are semantically similar to each other. Second, we extractthe latent feature matrices for expertise and authority through decom-posing the two corresponding matrices. Third, we adopt an attentionmechanism to examine the similarity among segments of questionsand answers. The answer’s attention representation is adjusted by theextracted expertise and authority features of the answerer. Eventually,the similarity between the question and a candidate answer is gen-erated by taking the inner product of the attentive representation ofthe question and the adjusted attentive representation of the answer.Experiments with both vertical and general CQA sites demonstratethe superior performance of the proposed KAAS model. For verticalcommunity (i.e., HeathTap, which is a medical area CQA site), weintroduce an external knowledge graph (i.e., health knowledge graph) and embed this knowledge graph before attentive learning.There are three main contributions of our paper. First, KAAS in-corporates both the expertise and the authority of the answerer toenhance the performance. Second, we utilize the human-labeled tagsand votes as the domain knowledge to infer the expertise of the answer.Third, we propose a matrix decomposition-based method to infer theauthority of the answerer and incorporate such information in theprocess of evaluating the similarity between segments.
To capture the sequential contextual features in free text, LSTM [25],particularly bidirectional LSTM (biLSTM) [26], has been the basicmodeling framework for answer selection [27].
Figure 2a presentsthis general framework. First, an embedding method (e.g. word2vec[28]) is used to encode the text. Second, we use biLSTM to generatethe question feature matrix Q and answer feature matrix A , respec-tively. Third, column-wise max pooling (or other pooling methods) isused to transform the feature matrices into the representing vectorsof the question and answer. Last, the similarity score is obtained bycalculating the cosine similarity between the representing vectors.There are a number of variants that improve over the base biL-STM model, including replacing biLSTM with convolutional neuralnetwork (CNN) as the sentence model [29], using both CNN andbiLSTM to jointly learn feature matrices [30], and using multiplebiLSTM components to learn the similarity from the feature matri-ces/vectors [31]. QuestionAnswerbiLSTM Model Question
Answer
Attentive biLSTM Model
𝐴 𝐴𝑄 𝑄 Column-wise 𝐺 Row-wiseMax Pooling MaxPooling 𝑟 𝑎 𝑟 𝑞 Similarity
𝑄 ×
𝐴 ×𝑟 𝑎 𝑟 𝑞 AttentivePartEmbedding and biLSTM
Embedding and biLSTM
Column-wise softmax softmax
Similarity
Figure 2.
The general neural network framework for answer selection.
Attentive pooling is a method to enable the pooling layer to be awareof the input pair [18]. With an attention mechanism, the informationof input items directly influence the calculation of each other’s repre-sentations, and thus enhance the capability in evaluating the similaritybetween two inputs [31]. The attentive pooling method (shown in
Figure 2b ) has been recently adopted as a standard for answer selec-tion task, in which the two inputs can naturally represent the questionatrix and answer matrix (from the biLSTM component) [32, 33]. Arecent study further extends the attention matrix to 3rd-order tensorto consider the relationships among segments within questions oranswers [23].
Recent advances [2, 20, 21, 22, 23, 24] in CQA answer selection gobeyond pure text mining and incorporate the community informa-tion. Particularly, we can evaluate the answerer’s expertise throughanalyzing his or her previous answers, and estimate the authority ofan answerer through examining his or her topological position in thecommunity. Zhao et. al (2017) propose the Asymmetric Multi-FacetedRanking Network Learning (AMRNL) model [20] that uses the countof an answerer’s followers to indicate the authority, and adjust thesimilarity score by introducing the question-authority matching score[24]. Lei et. al (2018) borrow the idea of residual networks and pro-pose the Multi-View Fusion Neural Network (MVFNN) model totake topics of the question into consideration [17]. Wen et. al (2019)propose the Hybrid Attentive with Deep Users (UIA-LSTM) modelthat combines the text of the candidate answer and the text of thecorresponding answerer’s previous answers for the following attentivepooling procedure [21, 22]. This approach is effective for verticalCQA communities, where users often do not answer questions outof their domain expertise. However, it might introduce noise for theanswer selection task in general CQA communities because usersmay answer questions in quite different domains.
First, we perform the word embedding of the original text. Word2vec[28] is used to train the word vector. Note that we may use otherword embedding methods. Next, we follow [27] to use the biLSTM torepresent the question and the answer. More specifically, we representa given sentence as X = ( x , x , . . . , x n ) , in which x t is the -dimension embedded vector for the word. The hidden vector h t attime step t in the LSTM component is updated as follows: i t = σ ( W i x t + U i h t − + b i ) , (1) f t = σ ( W f x t + U f h t − + b f ) , (2) o t = σ ( W o x t + U o h t − + b o ) , (3) C (cid:48) t = tanh( W c x t + U c h t − + b c ) , (4) C t = i c ∗ C (cid:48) t + f t ∗ C t − ) , (5) h t = o c ∗ tanh( C t ) , (6)where σ is the sigmoid activation function, i represents the input gate, f represents the forget gate, o represents the output gate, C denotesthe cell memory, and W , U , b are network parameters. The formula(4) is the input transformation, and formula (5) updates of the cellstate.The standard LSTM only uses the information of the past. biLSTM,on the other hand, utilizes both the previous and future context byprocessing the sequence on two directions, and generates two indepen-dent sequences of LSTM output vectors. Because biLSTM models thecontext information for each word, biLSTM-based representation isusually more accurate than LSTM in the answer selection task [18]. Inour model, the biLSTM output at each time step is the concatenationof the two output vectors from both directions, i.e., h t = h ft || h rt . Previous answers provided by an answerer can be used to represent theexpertise of the answerer. Particularly, the CQA site usually has the tag function to help users manually label the questions and answersinto a predefined categorical system of the domain knowledge. Suchrich information can help us model the answerers’ expertise with ahigh resolution (as compared to only using the text in the answers).We adopt a collaborative filtering [34, 35] approach to capture therelationship between a tag and an answerer based on the relationshipsamong all tags and previous answers. Because of the large number oftags, there exists the sparsity problem that we do not have sufficientdata to learn the representation of each tag [36]. Therefore, we makeuse of the taxonomy of the tags to group semantically similar tagsto a single higher-level tag. We illustrate the grouping procedure in
Figure 3 using the two datasets used in this study.
Psychiatry
Vertical CQA: HealthTap
General CQA: QatarLiving
Dermatology
Depression Anxiety Pimples Acne … … …… … … Work
Related
Health & Fitness
Working Salary World Cup … … …… …… Figure 3.
Grouping procedure using the two datasets. For both CQA com-munities, we group the lower-level tags to the immediate higher-level tags.For example, ”depression” and ”anxiety” are grouped to ”psychiatry,” and”working” and ”salary” are grouped to ”work related.”
After the tag grouping process, we define a weight H aij to representanswerer a ’s expertise in tag j as expressed by answer i . H aij ismeasured by the product of the frequency of a certain tag in an answerand the vote measure for this answer as follows H aij = f i ( j ) · v ( i ) , (7)where a denotes the answerer, f i ( j ) denotes the frequency of tag j inthe previous answer i , and v ( i ) denotes the vote measure for answer i .For brevity, we omit the answerer identifier a in the rest of the papersince we focus on modeling the answers and follower of a single an-swerer (no interactions between competing answerers). For the CQAcommunity (HealthTap) that exhibits the actual count of up-votes, thevote measure is the count. For the CQA community (QatarLiving )that only provides a categorical evaluation of the quality of answer(i.e. ”Good”/”Potentially Useful”/”Bad”), we set a numeric value foreach category: Good=2, Potentially Useful=1, and Bad=0. For ananswerer, we construct an answer-tag quality matrix H ∈ R A × T . A is the number of answers posted by the answerer, and T is the totalnumber of tags that are shared by all answers. Then, we employ thesingular-value decomposition (SVD) [37] to decompose H to obtainthe feature matrices for the answerer’s previous answers (expertise)and the tags as follows H = U a Σ a V a , (8)where U a represents the feature matrix for the answerer’s previousanswers, Σ a represents the scaling matrix, and V a represents thefeature matrix for the tags. .3 Authority representation based on socialnetworks The topological position of an answerer in the social network formedby following relationship represent the authority of the answerer.In addition, the interests/specialty of an answerer’s followers (ex-pressed by tags) can be used to further enrich the representation ofthe answerer’s authority. Existing method of inferring the answerer’sauthority is to have a linear combination of the followers’ tags [34,35]. However, this approach has a strong assumption that the tagsare independent. Similarily, we employ the SVD [37] to capture therelationship between the an answerer and his or her followers (andtheir tags) based on the relationships among all tags and followers.Semantically similar tags are also grouped to a single higher-level tag(as shown in
Figure 3 ).After the tag grouping process, we define a weight S ij to representthe answerer’s authority in tag j as inferred from the answerer’sfollower i . S ij is measured by the frequency of a certain tag of afollower as follows S ij = f i ( j ) , (9)where f i ( j ) denotes the frequency of tag j of the answerer’s follower i . Note that this tag is labeled by the follower i . Before tag grouping,the frequency for each tag is either 1 or 0. After tag grouping, thefrequency refers to the number of lower-level tags labeled by i . Forexample, if the follower is labeled with ”depression” and ”anxiety,”both are then grouped to ”psychiatry,” the frequency of ”psychiatry”for i is 2. For the follower i , we construct a follower-tag quality matrix S ∈ R F × T . F is the number of the answerer’s followers and T isthe total number of tags that are shared by all. Then, we employ theSVD to decompose S to obtain the feature matrices for the answerer’ssocial network and the tags as follows S = U s Σ s V s , (10) V s = V a , (11)where U s represents the feature matrix for the answerer’s social net-work (authority), Σ s represents the scaling matrix, and V s representsthe feature matrix for the tags which is set equal to V a .As for the tag matrix, because the SVD is not unique as a smallportion of top singular values can approximately represent the wholematrix, so we first approximate matrix V a , and then set the V s and V k g equal. SVD is set based on numeric experiments. In vertical community question answering system, there are muchprofessional knowledge. To make full use of this kind of knowledge,we incorporate an external knowledge graph. For example, in thispaper, for HealthTap site, we introduce a health knowledge graph(as shown in
Figure 4 ) which is derived from Electronic MedicalRecords (EMR) [43] and it can include more professional medicalknowledge. There are high quality knowledge bases linking diseasesand symptoms, while diseases can be grouped to single higher-leveltags similarly. Then we define a weight KG ij to represent the candi-date answer’s total weights in tag j in terms of the answer’s symptomconcept i . KG ij is measured by the frequency of a certain tag ofsymtoms as follows KG ij = w i ( j ) , (12)where w i ( j ) denotes the total weights of tag j in terms of the answer’ssymptom concept i . For the symptom i , we construct a symptom-tag quality matrix KG ∈ R SC × T . SC is the number of the answer’ssymptom concepts and T is the total number of tags that are sharedby all. Then, we employ the SVD to decompose KG to obtain thefeature matrices for the answer’s symptom concepts and the tags asfollows KG = U kg Σ kg V kg , (13) V s = V a = V kg , (14)where U kg represents the feature matrix for the answer’s symptomconcepts (relationship knowledge graph), Σ kg represents the scalingmatrix, and V kg represents the feature matrix for the tags which is setequal to V a and V s . pink eye cataractspain redness itching … vertigo … Figure 4.
Health Knowledge Graph [43]. Different diseases (i.e., lower-level tags; e.g., pink eye, cataracts) have a relating graph to the symptomconcepts (e.g., pain, redness, vertigo). From this knowledge graph, we can seethat ophthalmic (a higher-level tag) has a sum relationship weight .
198 =0 .
035 + 0 . with pain (a symptom concept which might be mentioned inanswers). To capture the relationship between each segment in the question andeach segment in the answer, we follow [23] to construct a concatena-tion tensor G as follows G ij = σ ( W ∗ [ h q ( i ) , h a ( j ) ] + b ) , (15)where i = 1 , , . . . , M ; j = 1 , , . . . , L and σ is the sigmoid activa-tion function, W is the transformation matrix, h q ( i ) is the i -th hiddenstate of biLSTM question representations, h a ( j ) is the j -th hiddenstate of biLSTM answer representations, and b is the bias vector. M is the length of the question and L is the length of the answer.Following the standard attentive pooling scheme [18], we use therow-wise pooling to obtain an interaction matrix r q , which capturesthe relationship between each segment in question with all segmentsin answer, and use the column-wise pooling to obtain an interactionmatrix r a , which captures the relationship between each segment inanswer and all segments in the question r q ( i ) = max KAAS with that of following state-of-the-art models. PLANE is a non-neural network method based onstatistical NLP feature extractions. It has an offline learning compo-nent and an online search component [2]. LSTM is the basic biLSTMmodel without attentive component [27]. AP-LSTM has a similarbiLSTM architecture with the attentive pooling component [18]. AI-CNN takes the interaction of sentence pair into consideration, result-ing in a 3D tensor to capture the relationship among the segments[23]. AI-CNN-F computes the similarity through adding additionalcommunity information (received thanks and agrees) [23]. MVFNN models answer selection task with a multi-view fusion neural networkbased on the idea of residual networks [17]. AMRNL uses a linearcombination of followers’ tags to represent the authority information,and adjusts the final matching score [20]. Table 1. Performance on HealthTap (HT) dataset.Model \ Evaluation P@1 P@2PLANE 33.2% 52.6%LSTM 37.7% 59.5%AP-LSTM 39.5% 62.4%AI-CNN 39.8% 62.9%AI-CNN-F 40.1% 63.7%MVFNN 40.1% 63.5%AMRNL 39.4% 62.9%KAAS Performance on QatarLiving (QL) dataset.Model \ Evaluation MAP Accuracy F1PLANE 69.7% 66.5% 60.0%LSTM 75.3% 74.1% 70.9%AP-LSTM 77.1% 75.5% 71.7%AI-CNN 79.2% 76.3% 72.8%AI-CNN-F 80.1% 76.9% 73.0%MVFNN 80.0% 77.3% 73.3%AMRNL 70.4% 68.5% 62.7%KAAS Table 1 and 2 present the performance on both HealthTap and Qatar-Living datasets. In general, the proposed KAAS model consistently https://sites.google.com/view/jingfengshi/home/blog/code utperforms state-of-the-art baseline models. More specifically, forthe HealthTap dataset, we use P@1 and P@2 to measure the accuracy.P@K is the frequency of successfully predicting the best K answer.As the only non-neural network model, PLANE has the lowest ac-curacy, but it has the advantage in computational efficiency. FromLSTM to AP-LSTM and to AI-CNN/AI-CNN-F, the performanceimproves with additional attentive pooling framework and communityinformation. MVFNN’s performance is similar to AI-CNN-F becauseit also utilizes the simple community information. AMRNL, on theother hand, only leads to similar performance as the AP-LSTM, indi-cating the linear combination of tags is less effective than expected,probably due to the sparsity problem in the tag distributions.Because the QatarLiving dataset provides a categorical evaluation(instead of vote count) of the answer, we adopted MAP, Accuracy,and F1-score to evaluate the performance. We have similar finding:the attentive pooling framework and the inclusion of community in-formation can improve the performance. The proposed KAAS modelperforms the best consistently. (a) Parameter Analysis for HealthTap data: P@1 in terms of hidden size (b) Parameter Analysis for Qatarliving data: P@1 in terms of hidden size Figure 5. Parameter analysis for two datasets We further analyze the sensitivity of the KAAS model in termsof the size of the biLSTM hidden layer. As shown in Table 3 and Figure 5 , we observe that the size of the hidden layer influencesthe performance of the model. A trade-off lies between the modelcomplexity and the performance. In particular, when the hidden layersize is small, we can improve the performance by increasing thesize of the hidden layer. However, when the hidden layer size islarge than a change point, the performance declines, which could bedue to the overfitting issue and the lack of sufficient data to fit theadditional parameters. The size of 128 in the previous experimentsis set based on this sensitivity analysis. Finally we conduct ablationstudies as shown in Table 4 . We can find that among three parts (i.e.,authority, expertise and knowledge graph), expertise information isthe most significant. In the future, we plan to optimize the matrixdecomposition part to further enhance the full model’s performance. Table 3. The performance with different sizes of the hidden layer.Hidden Size 32 64 128 256 512P@1 HT 38.8% 39.1% In this paper, we propose the KAAS model for the CQA answerselection task. KAAS is based on a biLSTM neural network withattentive pooling mechanism. It incorporates both the expertise andauthority information learned from the answerer’s previous answers,the tags of the answers, the followers of the answerer, and the tags of Table 4. Results of ablation studies in HT Dataset.Model P@1KAAS None 39.60%KAAS Expertise 40.23%KAAS Authority 39.95%KAAS Knowledge Graph 39.73%KAAS Expertise & Authority 40.32%KAAS Authority & Knowledge Graph 40.07%KAAS Knowledge Graph & Expertise 40.11%KAAS Full 40.38% the followers. For vertical community, an external knowledge graphis also utilized which can capture semantic information betweenquestions and answers [41, 42, 43]. In the end, experiments withboth general and vertical CQA datasets show that the KAAS modeloutperforms state-of-the-art answer selection models.The novelty of our model comes from the incorporation of com-munity information and domain knowledge. We combine the existingtechniques with an efficient modeling framework. The model presentsa generic framework that incorporates the community information aswell as external knowledge, which does not necessarily have to be inthe same format in different datasets. We can easily modify the SVDand attention components to incorporate different types of communityinformation that can be extracted from other datasets or other typesof side information. For CQA sites where no authors’ informationis available, we can always incorporate external knowledge graphsinto it or we may crawler information from website. In hence, ourproposed model is both general and novel, and we can rather easilyapply it to other recommendation problems in CQA .In conclusion, this paper sheds light on the efficacy of the commu-nity information in inferring the expertise and authority of answerers,and could inform future research to better mine and utilize the do-main knowledge hidden in the CQA community. One possible futuredirection is to optimize the decomposition process. Given that wewill have to consider its influence on downstream learning task, theoptimization is difficult but a valuable try. Another opportunity is togenerate knowledge graph from the CQA community itself, whichmight be more suitable for revealing semantic information betweenthe community questions and answers, and thus is a rewarding whilevery challenging future work. REFERENCES [1] Yuan, S., Zhang, Y., Tang, J., Hall, W. & Cabot, J. B. (2019).Expert finding in community question answering: a review. ArtificialIntelligence Review , in press.[2] Nie, L., Wei, X., Zhang, D., Wang, X., Gao, Z. & Yang, Y.(2017). Data-driven answer selection in community QA systems. IEEE transactions on knowledge and data engineering IEEE Transactions on knowledge and DataEngineering , 27(8), 2107-2119.[4] Sun, R., Cui, H., Li, K., Kan, M. Y. & Chua, T. S. (2005).Dependency relation matching for answer selection. In Proceedingsof the 28th annual international ACM SIGIR conference on Researchand development in information retrieval (pp. 651-652). ACM.[5] Zhang, D. & Lee, W. S. (2003). Question classification usingsupport vector machines. In Proceedings of the 26th annual inter-national ACM SIGIR conference on Research and development ininformaion retrieval (pp. 26-32). ACM.6] Manning, C. D., Manning, C. D. & Schtze, H. (1999). Foun-dations of statistical natural language processing . Cambridge, MA:MIT Press.[7] Moschitti, A., Quarteroni, S., Basili, R. & Manandhar, S. (2007).Exploiting syntactic and shallow semantic kernels for question an-swer classification. In Proceedings of the 45th annual meeting of theassociation of computational linguistics (pp. 776-783). ACL.[8] Heilman, M. & Smith, N. A. (2010). Tree edit models for rec-ognizing textual entailments, paraphrases, and answers to questions.In Human Language Technologies: The 2010 Annual Conference ofthe North American Chapter of the Association for ComputationalLinguistics (pp. 1011-1019). ACL.[9] Xue, X., Jeon, J. & Croft, W. B. (2008). Retrieval models forquestion and answer archives. In Proceedings of the 31st annualinternational ACM SIGIR conference on Research and developmentin information retrieval (pp. 475-482). ACM.[10] Rao, J., He, H. & Lin, J. (2017). Experiments with convolu-tional neural network models for answer selection. In Proceedingsof the 40th International ACM SIGIR Conference on Research andDevelopment in Information Retrieval (pp. 1217-1220). ACM.[11] Qiu, X. & Huang, X. (2015). Convolutional neural tensornetwork architecture for community-based question answering. In Proceedings of the 24th International Joint Conference on ArtificialIntelligence (pp. 1305-1311). IJCAI.[12] Yang, X., Khabsa, M., Wang, M., Wang, W., Awadallah, A.,Kifer, D. & Giles, C. L. (2018). Adversarial training for commu-nity question answer selection based on multi-scale matching. arXivpreprint arXiv:1804.08058 .[13] Wang, D. & Nyberg, E. (2015). A long short-term memorymodel for answer sentence selection in question answering. In Pro-ceedings of the 53rd Annual Meeting of the Association for Compu-tational Linguistics and the 7th International Joint Conference onNatural Language Processing (Vol. 2, pp. 707-712). ACL.[14] Tan, M., Santos, C. D., Xiang, B. & Zhou, B. (2015). Lstm-based deep learning models for non-factoid answer selection. arXivpreprint arXiv:1511.04108 .[15] Hao, Y., Liu, X., Wu, J. & Lv, P. (2019). Exploiting SentenceEmbedding for Medical Question Answering. In Proceedings of the33rd AAAI Conference on Artificial Intelligence . AAAI.[16] Wu, F., Duan, X., Xiao, J., Zhao, Z., Tang, S., Zhang, Y.& Zhuang, Y. (2017). Temporal interaction and causal influence incommunity-based question answering. IEEE Transactions on Knowl-edge and Data Engineering Proceedingsof the 32nd AAAI Conference on Artificial Intelligence (pp. 5422-5429). AAAI.[18] Santos, C. D., Tan, M., Xiang, B. & Zhou, B. (2016). Attentivepooling networks. arXiv preprint arXiv:1602.03609 .[19] Huang, H., Wei, X., Nie, L., Mao, X. & Xu, X. S. (2018). FromQuestion to Text: Question-Oriented Feature Attention for AnswerSelection. ACM Transactions on Information Systems (TOIS) Proceedings of the 31st AAAIConference on Artificial Intelligence (pp. 3532-3538). AAAI.[21] Wen, J., Ma, J., Feng, Y. & Zhong, M. (2018). Hybrid At-tentive Answer Selection in CQA With Deep Users Modelling. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (pp. 2556-2563). AAAI.[22] Wen, J., Tu, H., Cheng, X., Xie, R. & Yin, W. (2019). Jointmodeling of users, questions and answers for answer selection inCQA. Expert Systems with Applications Proceedings of the 31st AAAI Conference on ArtificialIntelligence (pp. 3525-3531). AAAI. [24] Zhao, Z., Zhang, L., He, X. & Ng, W. (2014). Expert findingfor question answering via graph regularized matrix completion. IEEETransactions on Knowledge and Data Engineering Neural computation Neural Networks arXiv preprint arXiv:1308.0850 .[28] Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficientestimation of word representations in vector space. arXiv preprintarXiv:1301.3781 .[29] Hu, B., Lu, Z., Li, H. & Chen, Q. (2014). Convolutional neuralnetwork architectures for matching natural language sentences. In Advances in neural information processing systems (pp. 2042-2050).NIPS.[30] Tan, M., Dos Santos, C., Xiang, B. & Zhou, B. (2016). Im-proved representation learning for question answer matching. In Pro-ceedings of the 54th Annual Meeting of the Association for Computa-tional Linguistics (Vol. 1, pp. 464-473). ACL.[31] Yin, W., Yu, M., Xiang, B., Zhou, B. & Schtze, H. (2016).Simple question answering by attentive convolutional neural network. arXiv preprint arXiv:1606.03391 .[32] Bian, W., Li, S., Yang, Z., Chen, G. & Lin, Z. (2017). Acompare-aggregate model with dynamic-clip attention for answerselection. In Proceedings of the 2017 ACM on Conference on Infor-mation and Knowledge Management (pp. 1987-1990). ACM.[33] Xiang, Y., Chen, Q., Wang, X. & Qin, Y. (2017). Answer selec-tion in community question answering via attentive neural networks. IEEE Signal Processing Letters Proceedings of the 1994 ACM conference on Computersupported cooperative work (pp. 175-186). ACM.[35] Marlin, B. M. (2004). Modeling user rating profiles for col-laborative filtering. In Advances in neural information processingsystems (pp. 627-634). NIPS.[36] Nakatsuji, M., Fujiwara, Y., Uchiyama, T. & Toda, H. (2012).Collaborative filtering by analyzing dynamic user interests modeledby taxonomy. In International Semantic Web Conference (pp. 361-377). ISWC.[37] Golub, G. H. & Reinsch, C. (1971). Singular value decompo-sition and least squares solutions. In Linear Algebra (pp. 134-151).Berlin: Springer.[38] Weston, J., Bengio, S. & Usunier, N. (2011). Wsabie: Scalingup to large vocabulary image annotation. In The 22nd InternationalJoint Conference on Artificial Intelligence (pp. 2764-2770). IJCAI.[39] Deng, Y., Xie, Y., Li, Y., Yang, M., Du, N., Fan, W. & Shen, Y.(2018). Multi-Task Learning with Multi-View Attention for AnswerSelection and Knowledge Base Question Answering. arXiv preprintarXiv:1812.02354 .[40] Nakov, P., Hoogeveen, D., Mrquez, L., Moschitti, A., Mubarak,H., Baldwin, T. & Verspoor, K. (2017). SemEval-2017 task 3: Com-munity question answering. In Proceedings of the 11th InternationalWorkshop on Semantic Evaluation (SemEval-2017) (pp. 27-48). ACL.[41] Wei, X., Huang, H., Nie, L., Zhang, H., Mao, X. L. & Chua, T.S. (2016). I know what you want to express: sentence element infer-ence by incorporating external knowledge base. IEEE Transactionson Knowledge and Data Engineering Proceedings of the TwelfthACM International Conference on Web Search and Data Mining (pp.105-113). ACM.[43] Rotmensch, M., Halpern, Y., Tlimat, A., Horng, S. & Sontag,D. A. (2017). Learning a health knowledge graph from electronicmedical records.