[PDF] A deep belief network-based method to identify proteomic risk markers for Alzheimer disease

Abstract

While a large body of research has formally identified apolipoprotein E (APOE) as a major genetic risk marker for Alzheimer disease, accumulating evidence supports the notion that other risk markers may exist. The traditional Alzheimer-specific signature analysis methods, however, have not been able to make full use of rich protein expression data, especially the interaction between attributes. This paper develops a novel feature selection method to identify pathogenic factors of Alzheimer disease using the proteomic and clinical data. This approach has taken the weights of network nodes as the importance order of signaling protein expression values. After generating and evaluating the candidate subset, the method helps to select an optimal subset of proteins that achieved an accuracy greater than 90%, which is superior to traditional machine learning methods for clinical Alzheimer disease diagnosis. Besides identifying a proteomic risk marker and further reinforce the link between metabolic risk factors and Alzheimer disease, this paper also suggests that apidonectin-linked pathways are a possible therapeutic drug target.

Full PDF

AA deep belief network-based method to identify proteomic risk markers for Alzheimer’s disease

Ning An

Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education, Hefei University of Technology, Hefei, China School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China [email protected]

Liuqi Jin

Huitong Ding

Jiaoyun Yang † Key Laboratory of Knowledge Engineering with Big Data of Ministry of Education, Hefei University of Technology, Hefei, China School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China, [email protected]

Jing Yuan

Department of Neurology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Beijing, China [email protected]

ABSTRACT

While a large body of research has formally identified apolipoprotein E (APOE) as a major genetic risk marker for Alzheimer’s disease, accumulating evidence supports the notion that other risk markers may exist. The traditional Alzheimer’s-specific signature analysis methods, however, have not been able to make full use of rich protein expression data, especially the interaction between attributes. This paper develops a novel feature selection method to identify pathogenic factors of Alzheimer’s disease using the proteomic and clinical data. This approach has taken the weights of network nodes as the importance order of signaling protein expression values. After generating and evaluating the candidate subset, the method helps to select an optimal subset of proteins that achieved an accuracy greater than 90%, which is superior to traditional machine learning methods for clinical Alzheimer’s disease diagnosis. Besides identifying a proteomic risk marker and further reinforce the link between metabolic risk factors and Alzheimer’s disease, this paper also suggests that apidonectin-linked pathways are a possible therapeutic drug target.

CCS CONCEPTS • Computing methodologies •

Machine learning • Learning paradigms •

Supervised learning •

Supervised learning by classification

KEYWORDS

Feature selection, Deep belief network, Alzheimer’s disease, Proteomic risk markers, ACRP30 Introduction A lzheimer’s disease (AD) is a common form of neurodegenerative disease that is estimated to affect 131 million people worldwide by 2050 [1]. AD accounts for 60% to 70% of cases in all dementia diseases [2, 3], and it can progress for years before symptoms become detectable by conventional means. One approach categorizes AD into two subtypes according to the age at onset: early-onset AD and late-onset AD. † Corresponding author

Ning An et al.

Approximately 60% of early-onset AD (EOAD) cases have a history of multiple family members with AD. There are 13% of these cases having autosomal dominant manners. That affects at least three generations [4, 5]. Except for several families with single-gene disorders, most of them present complex situations caused by multiple susceptibility genes and environmental factors. [6-8]. Late-onset AD (LOAD) is a prevalent disease of which the onset age is older than 60 or 65 years. In general, Overall, more than 90% of AD patients are sporadic cases that belong to LOAD [9]. Studies have shown that there is a genetic factor in LOAD, but no study has identified any causative gene. Indeed, many genetic studies have consistently linked the APOE gene to sporadic LOAD [10-14]. There are other unidentified genetic or environmental factors, as many people with the APOE risk allele ( ε

4) live into their 90s. Diagnostic markers should have the ability to reflect pathogenic processes of the AD, including the degeneration of the neurons and synapses [15]. Some studies have identified tau and the 42 amino acid β - amyloid peptide (Aβ42) that met this criterion using traditional methods [16-18]. Variance analysis can compare the differences between AD cases and healthy controls. The Pearson correlation coefficient is computed to assess the correlation between markers and AD. Researchers have calculated sensitivity and specificity by the cutoff value to reflect the proportion of patients with different indicators [19]. These studies, however, require long-term follow-up for clinical neurochemical analyses and assay these two markers weekly. The process is notably time-consuming and complicated. There is a great need for a more effective way to evaluate the diagnostic potential of proteins. Recently researchers have focused on the development of diagnostic tools for AD [20,55]. The analysis of multiple image modalities is the primary means to understand the pathogenesis and identify the diagnostic markers of AD. For example, researchers have made valuable findings on Magnetic Resonance Imaging (MRI) [21], Positron Emission Tomography (PET) [22], and functional MRI (fMRI) [23]. Several studies have investigated a few Cerebrospinal Fluid (CSF) proteins [24]. Fewer studies have involved serum proteins [25] and using both CSF and serum proteins [26]. Although researchers have worked on improving diagnostic methods over the last several decades, there is still a great need for a fully automated and less subjective method that could detect the disease at earlier stages. This paper deployed a deep belief network (DBN) based method for AD diagnosis using 120 signaling proteins data. Identifying useful risk markers plays a vital role in determining AD, especially those markers that are detectable in the early stage of the disease.

It helps in early intervention to prevent neurodegeneration. Also, during the intervention process, the real-time testing of these markers can help to evaluate the effect of the intervention. Identifying the genes responsible for complex diseases can be very challenging. The most significant limitation of the traditional biological identification methods is the scalability of unique targets. The traditional methods depend on a priori determined hypotheses of different biological significance. There are also statistical methods to identify potential diagnostic biomarkers. Eric Nagele [27] evaluated the differential significance of each biomarker and selected ten autoantibody biomarkers that could effectively differentiate AD, but they did not explain the biological mechanisms that could account for the findings and need to verify the prediction accuracy by constructing other models. In contrast, the proposed method could select biomarkers that maximize disease diagnosis simultaneously in an automated approach. The proposed method is also able to identify pathogenic factors that are biologically significant, an important breakthrough in understanding the underlying pathology. There is a study that formulizes biomarker identification from signaling proteins expression data as a feature selection problem [28]. In this context, the purpose is to select a group of proteins that is most effective in diagnosing AD. Yang and others consider this problem, and various search strategies could be applied [29]. Besides the capability of dimensional reduction, the improvement of feature selection methods is in need of other aspects, including interpretability, time efficiency, and generalization ability [30]. Researchers have categorized the feature selection methods into filter-based methods, wrapper-based methods, and embedded-based methods, according to how to combine feature search with classification tasks. Filter methods evaluate each feature separately and ignore their interaction. Besides, the feature importance assessment is independent of the classification task. Chaves proposed a filter-based method based on association rule mining algorithm for AD [31]. Computing the correlation between features and labels can help select features in classification tasks. The valuable features are highly correlated with the label while weakly correlated with other features. As a commonly used measure,

Spearman’s correlation coefficient assesses the degree of correlation between the two features. Traditional machine learning methods focus on a shallow learning structure that contains only nonlinear transformation. The caveat of the shallow structure is that it cannot represent complex functions, which is limited by sample size and a calculation unit. The complexity of complex classification problems further limits its generalization capacity. There is a need to demonstrate that the non-independence feature representation leads to a better feature set. Neuroscience research has shown that the human visual system uses a multi-layered system for information processing. This clear hierarchy for human perception dramatically reduces the complexity of data processing in the visual system and retains the useful information of the object structure. The success of machine learning largely depends on data representation. Learning representations of the data help extract useful information for classification tasks. These learned representations reveal the nature of the data and can be migratable to other tasks [32]. The practical and theoretical experiences tell us that it is beneficial to use deep architecture to learn complex functions deep belief network-based method to identify proteomic risk markers for Alzheimer’s disease that can represent high-level abstractions. Recently, deep learning-based methods have achieved better performance than other methods for many applications, but little work has utilized it for AD prediction with protein expression data. Despite numerous studies identifying other AD genetic risk factors, APOE [33, 34] remains the best predictive gene of AD for which there is a scientific consensus. Previous biological studies have hypothesized that adiponectin is also possibly related to AD [35]. This paper proposes a novel feature selection method based on multiple levels of data representation and ranking the feature based on the weights of the deep network. ACRP30 is a protein that is encoded by the ADIPOQ gene in which mutations have been associated with adiponectin deficiency. This paper seeks to determine whether ACRP30 is a potential risk marker for AD diagnosis. Furthermore, we examine whether the proposed deep learning framework can achieve diagnostic accuracy comparable to that of more widely used methods. Proposed method

This paper considers the AD diagnosis as a classification problem. The objective is to judge the cognitive status of people according to their signaling protein expression values. This paper proposes a method for disease diagnosis using Deep Belief Network (DBN), which is a probabilistic generative neural network consisting of multiple layers of Restricted Boltzmann Machine (RBM). The preferable performance of DBNs is partly due to the stochastic nature of RBM. This machine could encapsulate a form of robustness to corruption in the representations that DBNs can learn despite the noise during training. A DBN satisfies the good intermediate representation criterion, that is, robustness to the partial destruction of the input [31]. For identifying the proteomic risk marker, this paper also proposed a belief network-based feature selection method using the weights of the network to indicate the importance of the features.

AD diagnosis

As shown in Figure 1, the input data consists of 120 protein expression values that measured from 90 AD cases and 90 non-demented controls (NDC). A DBN model uses this input data as the training dataset. The hidden layer aims to reduce the dimension of the raw data and learn a better representation of the features hidden in the inputs. This model can be useful for any feature extraction task, including biological data, as well as those with classes that are not linearly separable.

Figure 1: The proposed framework for AD diagnosis.

After training the DBN, the proposed method uses the learned weights to initialize a back-propagation neural network (BPNN) for AD diagnosis. BPNN is a multi-layer feedforward network that adopted an error backpropagation training algorithm. It learns and stores many mappings between input-output models without prior mathematic equations that describe this mapping. The learning rule used to adjust the weights and thresholds is the steepest descent method. This paper interests in two tasks with this model. The first task is to predict whether the participants diagnosed with AD is associated with a particular set of genomic features, such as gene expression data. The method created the training dataset with two classes, where 0.9 and 0.1 indicate AD and NDC respectively, rather than 1 and 0 given the sigmoid activation function. In the field of machine learning, most methods use learned features to realize their unique targets. These features have no direct relations with the originals. In the era of big biological data, there is an urgent need for effective dimension reduction methods that can form characteristics by selecting the given input attributes. These methods can extract key factors and reduce the necessary number of indexes to solve the problems. When the training process completes, the weight of the nodes connected each layer on behalf of the contribution to the activation values of upper layer nodes. According to the sum of the absolute value of weights connected input nodes and all nodes in the upper layer, one can choose more important input nodes subsequently. Therefore, this paper adjusts the threshold variably to select a smaller subset of plasma signaling proteins on which the methods could achieve comparative disease diagnostic accuracy.

DBN-based feature selection for identifying proteomic risk marker

This paper proposes a six-step feature selection method based on a designed feature importance criterion and a method for

Ning An et al. constructing candidate feature subsets. Table 1 describes the details of the algorithm. The weights of the nodes in the input layer and the hidden layer of the trained model serve to indicate the importance of protein expression values in AD diagnosis. After ranking the importance of proteins, important ones are selected in turn to construct candidate feature subsets. The AD prediction method is trained on them to form new models. The performance of these models on different feature subsets is compared by 10-fold cross-validation. The algorithm selects the optimal feature subset that gives the model the highest AD diagnosis performance. At the expense of additional computational burden, the whole procedure can repeat multiple times (in an outer cross-validation loop) to reduce the variance of performance prediction. Extensive experiments show that such an algorithm works well in practice. The training time is tolerable of the compact network size. The experiments suggest that the performance can be improved simply by waiting for bigger datasets to become available. Table 1. The proposed feature selection method. Input: Dataset G={X, Y} consists of a sample with protein expression values and label Output: Feature subset that contains k features Step 1

Data preparation

Divide the dataset G into training dataset {X train , Y train }, and testing dataset {X train , Y train }. X train is a matrix in which a column Xi represents the protein expression values of the i th protein in all samples. Y train is the label indicating the cognitive status of samples. Step 2 Feature importance criterion

Choose the weights between input nodes and hidden nodes of the proposed model as the feature importance criterion R(X i , Y), 1 ≤ i ≤ N. N is the total number of proteins. Step 3

Optimal model selection

Train DBN model on the training dataset, Initialize the NN with the parameters of DBN, Train the NN on the training dataset and adjust the hyperparameters to obtain the best AD diagnosis performance Step 4

Candidate feature subset construction

Sort the features based on the feature importance criterion: R(X r1 ,Y) ≤ R(X r2 ,Y) ≤ … ≤ R(X rN-1 ,Y) ≤ R(X rN ,Y) and construct candidate feature subsets S={S , S ,… , S N }, in which S ={X r1 }, S ={X r1 , X r2 }, … , S N ={ X r1 , X r2 ,… , X rN-1 , X rN }. Step 5 Optimal feature subset identification

For each j, 1 ≤ j ≤ N, Train the NN-based model on feature subset S i using 10-fold cross-validation. Record the accuracy ACC(j) of the model. End Select the smallest feature set S optimal that has the highest accuracy. Step 6 Optimal feature subset evaluation

Train the NN-based model on the training dataset using the feature subset S optimal

Test its performance on the testing dataset.

The causes of AD are still unknown but might have some connection with genes related to cholesterol [56]. This paper fully considers this possibility, and constructs a more compact deep architecture with a smaller number of hidden nodes. This architecture ensures that each hidden unit has a similar impact on the output layer. After training the network, this paper picks out the most crucial attribute of the inputs whose weights to the second hidden layer are the largest. In other words, it is the gene most relevant to AD within the scope of the candidate. Results

Dataset

Experimental settings

This section presents the experimental settings and results in the AD prediction and feature selection. It carries out 5-fold cross-validation on the dataset to determine the hyperparameters of the proposed model. During the model training, the loss function is calculated according to the result of the forward propagation of each batch of training data. Then the parameters of the model are updated using the gradient descent method. This process uses a learning rate to define the amplitude of each parameter update. If the learning rate is low, it will reduce the speed of network optimization and increase training time. On the contrary, if the learning rate is high, it may cause network parameters to swing back and forth on both sides of the optimal value, resulting in network convergence. The section adopts an effective method to set the learning rate that decays according to the number of iterations. The number of the epoch is set to 100. The model performance on AD phenotype prediction is evaluated by 10-fold cross-validation. As shown in Figure 2, accuracies of three classic machine learning methods, SVM, KNN, and BPNN, are relatively low. These methods are unable to meet the needs of clinical medicine. The proposed method is consistently better than three classic machine learning methods, SVM, KNN, and BPNN. deep belief network-based method to identify proteomic risk markers for Alzheimer’s disease

Figure 2: Performance comparison of the proposed method, SVM, KNN, and BPNN on AD diagnosis. .1 Comparison of the prediction performance on the data before and after using feature selection.

Figure 3 and Figure 4 indicate that the AD diagnosis performance of the proposed model based on the selected 20 signaling proteins is comparable with the performance using 120 signaling proteins. The accuracy of both were greater than 90%, which is significantly better than that of traditional machine learning methods. Therefore, the identification of a small group of signaling proteins in this paper can reduce model complexity and data collection expenses while achieving high diagnostic performance.

Figure 3: The AD diagnostic error based on 120 signaling proteins.

Figure 4: The AD diagnostic error based on the selected 20 signaling proteins . .2 Identifying ACRP30, a risk marker in AD.

This paper utilizes the hidden layer of the DBN model to learn a high-level feature representation of the signal proteins. The DBN model then uses these features to complete AD diagnosis tasks. On the other hand, this paper can also use these feature representations to reconstruct the raw expression values of signaling proteins. The proposed method uses the reconstruction error to adjust parameters of the prediction model for a better diagnosis performance, which calculated by the difference between the raw expression values and the high-level feature representation. Meanwhile, the optimization of the model for minimum reconstruction error in the training process can help learn a compact and AD-specific feature representation with as much raw information as possible. The optimal parameters of the model can learn the best feature representation with the smallest error.

This section uses the average reconstruction error to evaluate the effectiveness of model training and the reliability of feature transformation.

If the average reconstruction error of the trained model is small, it indicates that the learned high-level feature representation is credible.

On the contrary, if the error is high, it indicates that the trained model cannot learn the high-quality feature representation. The learned model does not fit the data well.

This paper minimizes the average reconstruction error by adjusting the number of hidden layer nodes. Figure 5 shows that the average reconstruction error is below 0.05 in the 100 epochs. This low error rate indicates that the model is well trained and can learn desired high-level features. Figure 6 shows AD diagnostic error based on expression values of 120 signaling proteins in the second model.

Ning An et al.

Figure 5: Average reconstruction error based on 120 signaling proteins in the second model.

Figure 6: The AD diagnostic error based on 120 signaling proteins in the second model.

This paper attempts to identify the relevant biomarkers and determine the etiology of AD by constructing a compact deep model that has little hidden nodes. The model self-learned some features hidden in the original data in an unsupervised way. When the training process of the DBN completes, we gain information on the weights of the connections between the nodes from adjacent layers. These weights indicate the contribution of the nodes to the final prediction results. Figure 7 shows the visualization of the weights between nodes of the input layer and nodes of the hidden layer.

Figure 7: Visualization of the weights between nodes of the input layer and nodes of the hidden layer.

For each level, this paper creates a gray-level image by mapping the weights to gray levels such that darker gray levels are for lower weights. A key advantage of having barcode images is that they provide an intuitive, informative, and global view of genomes, from which the importance of various genes becomes immediately apparent. The horizontal and vertical axes represent the number of first hidden layer nodes and the type of gene expression data, respectively, comprising a 120×5 weight matrix. It is worth noting that the brightest line in the picture is the sixty-first, which corresponding to the ACRP30. This finding indicates that there is a strong association between ACRP30 and AD, which may help establish an early diagnosis marker of AD. These findings are consistent with previous hypotheses, reinforcing the role of adipokines in the pathological mechanism of AD. Figure 8 shows the scores of five proteins, including ACRP30, TIMP-2, HGF, Eotaxin, and IGFBP-1. ACRP30 gets the highest score.

Figure 8: The calculated importance scores of 5 proteins. deep belief network-based method to identify proteomic risk markers for Alzheimer’s disease

Correlation is a statistical measure used to assess the degree of linear correlation between two continuous variables. The correlation coefficient, a dimensionless quantity with the values between -1 and 1, is used to indicate the strength of the correlation. The value of the correlation coefficient is 0, indicating that there is no linear correlation between variables. If the correlation coefficient between variables is -1 or 1, it means that there is a perfect linear correlation. Any value between -1 and 1 can represent the strength of the correlation.

The stronger the correlation, the closer the absolute value of the correlation coefficient is to 1. This paper calculates the

Spearman’s correla tion coefficient to compare with the proposed feature selection method. Table 2 indicates that the

Spearman’s correla tion coefficient between ACRP30 and AD is 0.048. The two-tailed P -value is 0.523. It indicates that the correlation is not significant. Thus, ACRP30 and AD have no linear correlation relationship and suggest that the proposed feature selection method can detect some associations, while the Spearman’s correla tion coefficients cannot.

Table 2.

Spearman’s correla tion coefficient between ACRP30 and AD.

Correlation ACRP30 AD Spearman's rho ACRP30 Correlation coefficient 1.000 .048 Sig. (2-tailed) . .523 N 180 180 AD Correlation coefficient .048 1.000 Sig. (2-tailed) .523 . N 180 180 Biological explanation

ACRP30 is the most abundant adipocytokine in plasma in many forms [39]. As a circulating protein, it is synthesized in adipose tissue and there is no clear evidence that it is produced in the brain.

ACRP30 plays a central regulatory role in energy homeostasis. Also, some researches have validated ACRP30 ’ neuroprotective role [40, 41]. Researchers suspect that ACRP30 may have multiple roles in neurodegenerative disorders, including AD. Juhyun concludes that it controls the microglia function of the brain [42]. As a surrogate marker, decreased adiponectin/ ACRP30 level could indicate AD pathological changes and links clinical comorbidities, inflammation, and cognitive dysfunction [43]. The process of receptor activation stimulates the intracellular catabolism of ceramide [44]. Many clinical studies have explored its role in AD pathology, but have not reached a consistent conclusion. There are pieces of evidence that AD patients have a lower adiponectin level than healthy people [57]. However, it does not have the prediction ability of dementia progression [45]. In addition to the direct regulation of the disease, ACRP30 also controls the disease indirectly by regulating other factors that affect the pathogenesis. Credible researches have shown that the accumulation of amyloid- β peptide (Aβ) is a critical marker in the pathogenic process of AD. ACRP30 could reduce Aβ generation and accumulation [46]. ACRP30 could also improve insulin resistance in the central nervous system [47], prevent nerve cell apoptosis [48, 49], vascular atherosclerosis, and regulate glycolipid metabolism [50, 51]. In the Japanese population, a scientific paper shows that serum adiponectin level positively correlates with HDL-cholesterol [52]. The clinical study has shown that the high plasma cholesterol level is a risk factor for AD [53]. However, the cholesterol-AD hypothesis presents some difficulties. There is no consensus that the cholesterol level in plasma is indicative of cholesterol metabolism in AD brains. Some studies have reported that there is no relation between serum cholesterol levels and AD. Our results provide counterevidence in support of a link. Several epidemiological studies have linked obesity to AD [54], but the role of adiposity across the course of cognitive decline is not well-understood to date. Coronary artery disease patients have low plasma adiponectin levels. Moreover, some people have suggested an association between hypoadiponectinemia and carotid atherosclerosis [58]. The present study demonstrates that the plasma adiponectin level is positively related to sex, HDL-cholesterol, and BMI. Therefore, based on the results of this paper, there is a significant indication that cholesterol is an AD risk marker. ACRP30 has a direct or indirect regulatory effect on the AD pathological process. This paper designs a new feature selection method to support this finding from the perspective of data science. Therefore, it can serve as a useful therapeutic target to alleviate AD manifestations. Conclusions

This paper uses a novel method based upon DBN to advance the understanding of AD diagnosis as it relates to blood plasma protein levels. Even more important than the precision of this method is its generalizability. Due to the influence of the structural characteristics of DBN, it is convenient to learn the nature of expression data automatically and promising for medical applications in the era of big data. This paper proposes a feature selection algorithm that ranks the features according to the weight in a deep network. The size of the feature subset can be set variably according to the balance of the diagnostic accuracy and complexity of data collection. Thereby, the potential correlation between ACRP30 and AD has been demonstrated using computational science for the first time and suggests its potential biological significance. The experiments suggest that the proposed method is significantly better than classical machine learning classification methods, including KNN, SVM, and BPNN. One can obtain similar

Ning An et al. forecasting performance using a subset of features, which significantly reduces the number of indicators that must be collected to predict the disease. The proposed feature selection method offers the potential to overcome the problems of traditional approaches with feature dimensionality and limited size data sets. This method also simplifies the measurement index required for diagnosis. One could select a subset of factors based on the expected diagnostic accuracy. This paper shows ACRP30 to be a causative factor in AD both by unsupervised and deep learning methods. However, the study also demonstrated that obesity and cholesterol are risk factors for AD. These results enhance the genetic knowledge of AD and point out that apidonectin-linked pathways could be a therapeutic drug target.

ACKNOWLEDGMENTS

This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1003204, CAMS Initiative for Innovative Medicine (CAMS-I2M, No. 2016-I2M-1-004), the Anhui

Provincial

Key Project of Research and Development Plan under Grant No. 1704e1002221, and the Programme of Introducing Talents of Discipline to Universities ("111 Program") under Grant No. B14025.

REFERENCES [1]

Prince M, Comas-Herrera A, Knapp M, et al. World Alzheimer report 2016: improving healthcare for people living with dementia: coverage, quality and costs now and in the future[J]. 2016. [2]

Fratiglioni L, De Ronchi D, Agüero-Torres H. Worldwide prevalence and incidence of dementia. Drugs Aging 1999; 15: 365 –

75. [3]

Fratiglioni L, Rocca W. Epidemiology of dementia. In: Boller F, Cappa SF, editors. Handbook of neuropsychology: aging and dementia. Amsterdam: Elsevier Sc Publ; 2001. p. 193 – Campion D, Dumanchin C, Hannequin D, et al. Early-onset autosomal dominant Alzheimer disease: prevalence, genetic heterogeneity, and mutation spectrum[J]. The American Journal of Human Genetics, 1999, 65(3): 664-670. [5]

Brickell K L, Steinbart E J, Rumbaugh M, et al. Early-onset Alzheimer disease in families with late-onset Alzheimer disease: a potential important subtype of familial Alzheimer disease[J]. Archives of neurology, 2006, 63(9): 1307-1311. [6]

Bird T D. Genetic aspects of Alzheimer disease[J]. Genetics in Medicine, 2008, 10(4): 231-239. [7]

Roses A D. On the discovery of the genetic association of Apolipoprotein E genotypes and common late-onset Alzheimer disease[J]. Journal of Alzheimer's Disease, 2006, 9(3 Supplement): 361-366. [8]

Kamboh M I. Molecular genetics of late ‐ onset Alzheimer's disease[J]. Annals of human genetics, 2004, 68(4): 381-404. [9] Bertram L, Tanzi R E. Alzheimer's disease: one disorder, too many genes? [J]. Human molecular genetics, 2004, 13(suppl_1): R135-R141. [10]

Roses A D, Saunders A M, Alberts M A, et al. Apolipoprotein E E4 allele and risk of dementia[J]. Jama, 1995, 273(5): 374-375. [11]

Schellenberg G D. Genetic dissection of Alzheimer disease, a heterogeneous disorder[J]. Proceedings of the National Academy of Sciences, 1995, 92(19): 8552-8559. [12]

Selkoe D J. Alzheimer’s Disease: Gene s, Proteins, and Therapy[J]. Physiological reviews, 2001, 81(2): 741-766. [13]

Couzin J. Once shunned, test for Alzheimer's risk headed to market[J]. Science, 2008, 319(5866): 1022-1023. [14]

Coon K D, Myers A J, Craig D W, et al. A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease[J]. The Journal of clinical psychiatry, 2007, 68(4): 613-618. [15]

Ronald T, National Institute on Aging Working Group. Consensus report of the working gro up on: “Molecular and biochemical markers of Alzheimer’s disease”[J]. Neurobiology of Aging, 1998, 19(2): 109 -116. [16]

Andreasen N, Vanmechelen E, Van de Voorde A, et al. Cerebrospinal fluid tau protein as a biochemical marker for Alzheimer’s disease: a commun ity based follow up study[J]. Journal of Neurology, Neurosurgery & Psychiatry, 1998, 64(3): 298-305. [17]

Tapiola T, Overmyer M, Lehtovirta M, et al. The level of cerebrospinal fluid tau correlates with neurofibrillary tangles in Alzheimer's disease[J]. Neuroreport, 1997, 8(18): 3961-3963. [18]

Andreasen N, Hesse C, Davidsson P, et al. Cerebrospinal fluid β -amyloid (1-42) in Alzheimer disease: differences between early-and late-onset Alzheimer disease and stability during the course of disease[J]. Archives of neurology, 1999, 56(6): 673-680. [19]

Hulstaert F, Blennow K, Ivanoiu A, et al. Improved discrimination of

Alzheimer’s disease patients using the combined measure of -amyloid (1 –

42) and tau in CSF, a multicenter study[J]. Neurology, 1999, 52: 1555-1562. [20]

German D C, Gurnani P, Nandi A, et al. Serum biomarkers for Alzheimer's disease: proteomic discovery[J]. Biomedicine & pharmacotherapy, 2007, 61(7): 383-389. [21]

Davatzikos C, Bhatt P, Shaw L M, et al. Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification[J]. Neurobiology of aging, 2011, 32(12): 2322. e19-2322. e27. [22]

Nordberg A, Rinne J O, Kadir A, et al. The use of PET in Alzheimer disease[J]. Nature Reviews Neurology, 2010, 6(2): 78-87. [23]

Greicius M D, Srivastava G, Reiss A L, et al. Default-mode network activity distinguishes Alzheimer's disease from healthy aging: evidence from functional MRI[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(13): 4637-4642. [24]

Westin K, Buchhave P, Nielsen H, et al. CCL2 is associated with a faster rate of cognitive decline during early stages of Alzheimer's disease[J]. PloS one, 2012, 7(1): e30525. [25]

Ray S, Britschgi M, Herbert C, et al. Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins[J]. Nature medicine, 2007, 13(11): 1359-1362. [26]

Tham A, Nordberg A, Grissom F E, et al. Insulin-like growth factors and insulin-like growth factor binding proteins in cerebrospinal fluid and serum of patients with dementia of the Alzheimer type[J]. Journal of Neural Transmission: Parkinson's Disease and Dementia Section, 1993, 5(3): 165-176. [27]

McKhann G M, Knopman D S, Chertkow H, et al, The diagnosis of dementia due to Alzheimer's disease: Recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease, Alzheimer's & Dementia, 7(3): 263-269 (2011) [28]

Wang, A., Chen, G., Yang, J., Zhao, S., and Chang, C. Y. A comparative study on human activity recognition using inertial sensors in a smartphone [J]. IEEE deep belief network-based method to identify proteomic risk markers for Alzheimer’s disease

Sensors Journal, 2016, 16(11), 4566-4578. [29]

Yang J, Xu Y, Sun G, Shang Y. A new progressive algorithm for a multiple longest common subsequences problem and its efficient parallelization [J]. IEEE Transactions on Parallel and Distributed Systems. 2013 May;24(5):862-70. [30]

S. Rathore, M. Hussain, an d A. Khan, “GECC: gene expression based ensemble classification of colon samples,” IEEE/ACM Trans. Comput. Biol. Bioinformat., vol. 11, no. 6, 1131-1145, Jan./Feb. 2014. [31]

Chaves, Rosa, Javier Ramírez, J. M. Górriz, Carlos García Puntonet, and

Alzheimer’s D isease Neuroimaging Initiative. "Association rule-based feature selection method for Alzheimer’s disease diagnosis." Expert Systems with

Applications 39, no. 14 (2012): 11766-11774. [32]

Yosinski, J, Clune, J, Bengio, Y, and Lipson, H, How transferable are features in deep neural networks? In NIPS (2014) [33]

Farrer L A, Cupples L A, Haines J L, et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer’s disease: a meta-analysis [J]. JAMA, 1997, 278(16): 1349-1356. [34]

Corder E H, Saunders A M, Strittmatter W J, et al. Gene dose of apolipoprotein

E type 4 allele and the risk of Alzheimer’s disease in late onset families[J].

Science, 1993, 261(5123): 921-923. [35]

Vincent P, Vincent P, Larochelle H, Extracting and composing robust features with denoising autoencoders, International Conference on Machine Learning (2008) [36]

Hinton G E, Learning multiple layers of representation, Trends in cognitive sciences,11(10): 428-434 (2007) [37]

Hinton, Geoffrey E. "Training products of experts by minimizing contrastive divergence." Neural computation 14, no. 8 (2002): 1771-1800. [38]

Ray S, Britschgi M, Herbert C, et al, Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins, Nature medicine,13(11): 1359-1362 (2007) [39]

Fruebis J, Tsao T S, Javorschi S, et al. Proteolytic cleavage product of 30-kDa adipocyte complement-related protein increases fatty acid oxidation in muscle and causes weight loss in mice[J]. Proceedings of the National Academy of Sciences, 2001, 98(4): 2005-2010. [40]

Qiu G, Wan R, Hu J, Mattson MP et al (2011) Adiponectin protects rat hippocampal neurons against excitotoxicity. Age 33(2):155 –

165 [41]

Jeon BT, Shin HJ, KimJB et al (2009) Adiponectin protects hippocampal neurons against kainic acid-induced excitotoxicity. Brain Res Rev 61(2):81 –

88 [42]

Song, Juhyun, Seong-Min Choi, and Byeong C. Kim. "Adiponectin Regulates the Polarization and Function of Microglia via PPAR- γ Signaling Under Amyloid β

Toxicity." Frontiers in cellular neuroscience 11 (2017): 64. [43]

Bertram L, Tanzi R, Genetics of Alzheimer's disease, Neurodegeneration: the molecular pathology of dementia and movement disorders, 51-91 (2011) [44]

Holland W, Miller R, Miller R, Receptor-mediated activation of ceramidase activity initiates the pleiotropic actions of adiponectin, Nature Medicine17(1) (2011) [45]

Teixeira A, Teixeira A, Diniz B, Decreased Levels of Circulating Adiponectin in Mild Cognitive Impairment and Alzheimer's Disease, Neuromolecular Medicine15(1): 115-121 (2013) [46]

Gulcelik N E, Halil M, Ariogul S, et al, Adipocytokines and aging: adiponectin and leptin, Minerva endocrinologica, 38(2): 203-210 (2013) [47]

Qiu G, Wan R, Hu J, et al, Adiponectin protects rat hippocampal neurons against excitotoxicity, Age, 33(2): 155-165 (2011) [48]

Jeon B T, Shin H J, Kim J B, et al. Adiponectin protects hippocampal neurons against kainic acid-induced excitotoxicity [J]. Brain research reviews, 2009, 61(2): 81-88. [49]

Qiu G, Wan R, Hu J, et al. Adiponectin protects rat hippocampal neurons against excitotoxicity[J]. Age, 2011, 33(2): 155-165. [50]

Li F Y L, Cheng K K Y, Lam K S L, et al. Cross ‐ talk between adipose tissue and vasculature: role of adiponectin [J]. Acta physiologica, 2011, 203(1): 167-180. [51] Ziemke F, Mantzoros C S. Adiponectin in insulin resistance: lessons from translational research[J]. The American journal of clinical nutrition, 2010, 91(1): 258S-261S. [52]

Yamamoto Y, Hirose H, Saito I, et al. Correlation of the adipocyte-derived protein adiponectin with insulin resistance index and serum high-density lipoprotein-cholesterol, independent of body mass index, in the Japanese population[J]. Clinical Science, 2002, 103(2): 137-142. [53]

Wood W G, Li L, Müller W E, et al. Cholesterol as a causative factor in Alzheimer's disease: a debatable hypothesis [J]. Journal of neurochemistry, 2014, 129(4): 559-572. [54]

Gustafson D R. Adiposity and cognitive decline: underlying mechanisms [J]. Journal of Alzheimer's Disease, 2012, 30(s2). [55]

Huitong D, Ning A, Rhoda A, et al. Exploring the hierarchical influence of cognitive functions for

Alzheimer’s disease: the Framingham Heart Study[J].

Journal of Medical Internet Research, 2020, accepted. [56]

Wollmer M A, Sleegers K, Ingelsson M, et al. Association study of cholesterol-related genes in Alzheimer’s disease[J]. Neurogenetics, 2007, 8(3): 179-188. [57]

Teixeira A L, Diniz B S, Campos A C, et al. Decreased levels of circulating adiponectin in mild cognitive impairment and Alzheimer’s disease[J]. Neuromolecular medicine, 2013, 15(1): 115-121. [58]