[PDF] Life detection strategy based on infrared vision and ultra-wideband radar data fusion

Abstract

The life detection method based on a single type of information source cannot meet the requirement of post-earthquake rescue due to its limitations in different scenes and bad robustness in life detection. This paper proposes a method based on deep neural network for multi-sensor decision-level fusion which concludes Convolutional Neural Network and Long Short Term Memory neural network (CNN+LSTM). Firstly, we calculate the value of the life detection probability of each sensor with various methods in the same scene simultaneously, which will be gathered to make samples for inputs of the deep neural network. Then we use Convolutional Neural Network (CNN) to extract the distribution characteristics of the spatial domain from inputs which is the two-channel combination of the probability values and the smoothing probability values of each life detection sensor respectively. Furthermore, the sequence time relationship of the outputs from the last layers will be analyzed with Long Short Term Memory (LSTM) layers, then we concatenate the results from three branches of LSTM layers. Finally, two sets of LSTM neural networks that is different from the previous layers are used to integrate the three branches of the features, and the results of the two classifications are output using the fully connected network with Binary Cross Entropy (BEC) loss function. Therefore, the classification results of the life detection can be concluded accurately with the proposed algorithm.

Full PDF

LLife detection strategy based on infrared vision and ultra-widebandradar data fusion li.yin , , , ym.zhou , ,

1. Shenzhen Institutes of Advanced Technology(SIAT), Chinese Academy of Sciences, Shenzhen 518055, P. R. China2. Guangdong Provincial Key Lab of Robotics and Intelligent System, SIAT, Chinese Academy of Sciences3. Key Laboratory of Human-Machine Intelligence Synergic Systems, SIAT, Chinese Academy of SciencesE-mail: [email protected], [email protected]

Abstract:

The life detection method based on a single type of information source cannot meet the requirement of post-earthquake rescue dueto its limitations in different scenes and bad robustness in life detection. This paper proposes a method based on deep neural networkfor multi-sensor decision-level fusion which concludes Convolutional Neural Network and Long Short Term Memory neural network(CNN+LSTM). Firstly, we calculate the value of the life detection probability of each sensor with various methods in the same scenesimultaneously, which will be gathered to make samples for inputs of the deep neural network. Then we use Convolutional NeuralNetwork (CNN) to extract the distribution characteristics of the spatial domain from inputs which is the two-channel combination ofthe probability values and the smoothing probability values of each life detection sensor respectively. Furthermore, the sequence timerelationship of the outputs from the last layers will be analyzed with Long Short Term Memory (LSTM) layers, then we concatenatethe results from three branches of LSTM layers. Finally, two sets of LSTM neural networks that is different from the previous layersare used to integrate the three branches of the features, and the results of the two classiﬁcations are output using the fully connectednetwork with Binary Cross Entropy (BEC) loss function. Therefore, the classiﬁcation results of the life detection can be concludedaccurately with the proposed algorithm.

Key Words:

Multi-sensor fusion, Life detection, Deep learning, CNN+LSTM

Studies have shown that China has entered a new period ofactive earthquakes and has always faced the threat of a strongearthquake[1]. How to improve the search efﬁciency and re-duce the damage of survivors in the earthquake, and how todetermine the existence and location of people in the ruins isof major challenging[2].Traditional life detection uses a single sensor. Ultra WideBand (UWB) life signal electromagnetic detection is an ad-vanced life detection technology, which uses the reﬂectionprinciple of electromagnetic waves to detect the micro-motioncaused by human breathing and heartbeat[3]. Infrared videolife detection is used to detect the living body state at night orunder high noise, by collecting temperature distribution of theliving body[4]. The weak acoustic wave detection is used todetect and identify weak sound signals of a living body, whichalways perform auxiliary rescue for life detection[5]. Eachsensor has its own advantages and disadvantages, and playsan important role in the life detection[6].The current technical disadvantage is that a single sensorcannot meet the life detection requirements of today’s com-plex scenes. A single sensor relies heavily on the environmentand is subject to serious environmental interference. It is im-portant to fuse different sensors to improve accuracy of thesearch, when we choose infrared video or acoustic waves andother methods for life detection. For example, in places whereelectromagnetic interference is relatively strong, auxiliary de-tection can be performed by means of audio signals, and UWBradar can be used for detection in the case of complex building coverage or relatively large thermal noise[7].A life detection method based on multi-sensors data fusionsuch as infrared and UWB radar is proposed, whose robustnessand accuracy are improved. At the same time, the discriminantmodel proposed in this paper based on the probability of lifedetection by multi-sensor fusion can be applied to the mobilelife detection platform more conveniently. For example, reduc-ing the hardware volume can be installed on the drone, and thedrone can give life detection results through the wireless signalbased on the life detection probability and the location of thedrone, which will give a quick and accurate guide to the ﬁrstaid staff.

This paper proposes a multi-sensor based life detectionmethod and strategy. The sensor fusion part uses a three-inputand single-output neural network to predict the probability oflife existence, whose inputs of the neural network is the proba-bility value of life detection for each sensor, including the lifedetection probability value of UWB radar, infrared video, andweak acoustic wave. Among them, the processing difﬁculty isUWB radar life detection, and the UWB radar adopts Princi-pal Component Analysis (PCA) dimensionality reduction andCNN+LSTM feature extraction method to obtain the probabil-ity of life detection.

The traditional UWB radar life detection method is to co-herently process the original signal and the target echo sig-nal to obtain the life detection result. For the vital signal, the a r X i v : . [ ee ss . SP ] M a y adar echo signal can be regarded as UWB radar original sig-nal modulated by the human micro-motion, the surroundingenvironment and other clutter signals. It is assumed that thechannel impulse response of the UWB radar can be expressedas: h ( τ, t ) = a δ ( τ − τ ( t )) + (cid:88) i a i δ ( τ − τ i ( t )) (1) t is the fast time sampling signal of UWB radar, which canbe used to represent distance information. τ is the slow timesampling signal of UWB radar, whose reciprocal is the PulseRepetition Frequency (PRF). Assuming that the radar trans-mits a signal p ( τ ) , the echo signal is expressed as: R ( τ, t ) = p ( τ ) ∗ h ( τ, t ) = a p ( τ − τ ( t )) + (cid:88) i a i p ( τ − τ i ) (2) R( m, n ) = r ( mT s , nT f ) (3) R( m, n ) = a p ( mT s − τ ( nT f )) + (cid:88) i a i p ( mT − τ i ) (4)Where T s and T f are the sampling interval in slow fre-quency and fast frequency, m = 0 , , , ..., M − n =0 , , , ..., N − , and R ( m, n ) is the radar echo matrix thatcarries vital signs information.The radar signal is a non-stationary nonlinear signal, so thatthe characteristics of its frequency region cannot be effectivelyseparated with wavelet transform. PCA can separate the com-ponents of the echo signal, suppress the clutter and improvethe signal matrix[8]. SIGNAL-NOISE RATIO (SNR) can bedecomposed into subspaces of vectors with the largest varianceby the PCA method, so that the radar signals with high SNRcan be reconstructed.Empirical Mode Decomposition (EMD) is a time-scale sig-nal decomposition method based on the time scale of the dataitself which obtain a series of Intrinsic Mode Function (IMF)containing local features of different time scale signals[9].EMD algorithm can also avoid the instantaneous frequencyﬂuctuation caused by the signal asymmetry while deﬁning theinstantaneous frequency. This is because each IMF componentrequires the following two conditions: • The data is distributed the entire time space, and the num-ber of extreme points (including the maximum value andthe minimum value) is equal to or at most one differ-ent from the zero crossing point, and the minimum valuepoint and the maximum value point need to be enveloped. • At any moment time, the mean of the upper and lowerenvelopes consisting of local extreme points is zero.The EMD decomposition on the signal is also called ascreening process which has two functions. It can remove thesuperimposed waveform and make the data more symmetrical.The speciﬁc steps are as follows: • To ﬁnd all the maxima and minima points of s, and use theﬁtted values to get the upper and lower envelope curves max and min respectively. • When calculate the mean value m , m = ( l max + l min ) / , the data s ( t ) subtract the mean value, h = s ( t ) − m , we can obtain h . If the obtained h can satisfythe IMF constraint, then the mark IM F = h . • Continue d = s ( t ) − IM F . • Repeat the above steps until d n can’t be decomposed, d n is the trend of s ( t ) , the data reconstructed after decompo-sition can be expressed as: x ( t ) = d n + n (cid:80) i IM F i .After the same signal is decomposed frequency vector withEMD, the IMF component is obtained sequentially from highfrequency to low frequency. Each IMF component representsdifferent vital signs, and the center hop frequency is higherthan the respiratory signal frequency, so it can make predic-tions about the life detection based on the frequency of theIMF component.However, the cost of EMD decomposition is computation-ally intensive. Moreover, it is more complicated to reconstructthe vital sign signal and select the characteristics of the tar-get echo signal such as respiration, heartbeat. This paper pro-poses a method for extracting features by CNN+LSTM on theradar signal. CNN can easily extract the important features ofthe weak signals and enhance the lack of feature mode in fea-ture extraction, with the echo signal preprocessed with PCA.LSTM network can enhance the feature relationship betweentime scales and reduce the instantaneous frequency ﬂuctua-tions without reconstruction of vital signs. The disadvantageis that the signal can’t be symmetrical like the EMD, and alsoCNN+LSTM method does not highly rely on the symmetry ofthe signal by using the afﬁne transformation between the neu-ral networks.We use the PCA to preprocess the data to suppress the clut-ter component in the echo signal[10] with the same operationin the original UWB signal. In order to obtain the neural net-work input samples, we combine the original wave signal andthe echo signal into a two-channel radar wave signal. CNN isused to extract two-channel signal feature, and LSTM is usedto analyze the correlation between the two-channel signal timescales. Then the neural network can determine whether there isa living body by a supervised classiﬁer with a probability valueof the life detection. The design of proposed neural network isshown in Fig 1. In this paper we use the infrared background differencemethod, the target feature extraction method, and Support Vec-tor Machine (SVM) classiﬁcation method to obtain the prob-ability value of the life detection. Human target detection ismainly divided into Region Of Interest (ROI) segmentationprocess and classiﬁcation detection process. The infrared im-age sequence ROI segmentation methods mainly include op-tical ﬂow method, frame difference method and backgrounddifference method[11]. The background difference methodig. 1: the UWB radar based life detectionFig. 2: the infrared video based life detectionuses the current frame image to differentiate from the refer-ence background. Common methods for classiﬁcation detec-tion of human targets include template-based matching meth-ods, recognition methods based on target motion information,and methods based on target feature extraction classiﬁcationsuch as SVM.The adaptive updating GMM algorithm is used to segmentthe ROI in the infrared image sequence to obtain the humancandidate target. Then, we classify the AOE-HOG feature thatis extracted from the candidate target by SVM which can im-prove the real-time and accuracy to certain extent.SVM[12] is a supervised machine learning method. Theclassiﬁcation of the target area is realized by constructing aclassiﬁcation interface. That is, a series of positive and nega-tive training samples are used to train and optimise the classi-ﬁer. The infrared video sensor processing is shown in Fig 2.In this paper, when analyzing the infrared video, we can ob-tain the probability value of the living body from the network,use a classiﬁer at the end of neural network which is good formulti-sensor fusion[13].

Sound signal can be used for life detection, which can beused to analyze various useful acoustic waves such as low fre-quency interval tap wave and physiological signals issued oftrapped people such as weak respiration and heartbeat. Thelife detection can be achieved by these two kinds of vital lifesignal.We ﬁrst adopt wavelet transform method to ﬁlter the noiseof the signals by threshold for the original signal. Then, thewavelet inverse transform is used to obtain the required effec-tive signal with little noise, which is useful for the feature ex-traction and classiﬁcation. We use Independent ComponentAnalysis (ICA) to separate the spectrum because the mixedmulti-source signals are often difﬁcult to be separated, sinceICA is a multi-channel blind source separation method devel- Fig. 3: the acoustic wave based life detectionoped by blind source separation technology[14]. The originalsignals must be independent from each other to satisfy the re-quirement of neural network in signal process[15].The acoustic wave life detection is as the same as the UWBradar life detection which also use correlation analysis princi-ple to analyze the original signal. The correlation function isdeﬁned as, r xy ( m ) = ∞ (cid:88) n = −∞ x ( n ) y ( n + m ) (5)Where x ( n ) and y ( n ) are mutual correlation, r xy ( m ) is thevalue at time m, which can determine whether there is life bycomparing the similarity of each ICA signals. If multiple sepa-rated signals are all similar, then there is no life, on the contrarythere may be life whose accuracy is not high in life detection.In this paper, convolutional neural network can replace themulti-channel independent component analysis to calculate thepossibility of the existence of the living body which has highaccuracy and low complexity. The speciﬁc process is shown inFig 3: In this paper, a life detection method and strategy based onmulti-sensor fusion is proposed by comparing and analysingthe basic theory of Dempster-Shafer (D-S) with neural networkmethod, which uses a three-input, single-output and two-classneural network to predict the probability of a living body. Thethree sensors will process the signals separately to get the prob-ability value of the life detection of a single sensor before en-tering the decision level data fusion. This neural network withCNN+LSTM can combine the advantages of multiple sensorsto achieve the optimal state of the sensor, and compensate theshortcomings of the D-S evidence theory in the decision-levelfusion ﬁxed rule, and can be extended to multiple sensors withhigh scalability[16].The traditional decision-level data fusion method of life de-tection generally uses the D-S evidence theory to conclude, sothat the most reliable life detection sensor can be selected. The-S synthesis rule combines multiple topics, such as predic-tions of different people or predictions of different sensors. Byusing D-S evidence theory for decision-level fusion, the pri-ori data is more intuitive and more integrated than that of theprobabilistic reasoning to obtain, which can be combined witha variety of data, especially for heterogeneous heterogeneousdata fusion[17]. However, the shortcoming is that the evidencesynthesis rule does not have strong theoretical support, and itsrationality and effectiveness are still controversial. Further-more there is an exponential explosion problem in calculation.The artiﬁcial neural network can learn the distribution of non-standard probability. Even if there is correlation between theoriginal data, the hidden distribution of the original data can bereﬂected by mapping between the neural network neurons. Theneural network has strong nonlinearity[18], and it can adapt tothe irregularity and robustness in decision-level reasoning.In this paper, the proposed sensor fusion based onCNN+LSTM neural network has high performance androbustness[19]. The decision-level fusion method is used tofuse the life detection result from three sensors to a probabil-ity, which gives the most reasonable life detection estimate.In order to enhance the robustness of the decision-level neu-ral network, we select the G time length decision informationof the three sensor processing results and combine them into avector of length 2*G[20]. Then the problem becomes a long-term sequence classiﬁcation problem whose time width is G.We can use the three-way parallel network whose ﬁrst layer isconvolutional network to process the input which is the origi-nal and smooth signal. Then output of the three-way networkis concatenate, and multiple LSTM networks of different sizesare used to perform inter-sequence analysis on the concatenatedata. Finally we use a full connection network output to pre-dict probability value with BEC loss function which is suitablefor the two-class network. The accuracy will depend on theparameters of network, and the actual tuning of the network isalso quite important.The input signal of each channel is concatenated into a two-dimensional vector using the original signal and the smoothedprobability value sequence. Then convolutional neural net-work is used for feature extraction. If the feature from orig-inal signals whose envelope of the signal is sharp, the valueof neural network is relatively unstable. Besides, the experi-ment also proves that if smoothing is not performed, the neuralnetwork will not learn anything but average probability value.On the contrary the neural network can not only learn its en-velope information, but also preserve the original distributionof the original signal waveform, which can make prediction ofprobability values for multiple sensors reasonable.The advantage of the proposed network is that the sensor canbe adjusted according to the results of its traditional methods,in the BEC loss function of formula 6 below. y n is the true tagvalue, which can be 0 or 1, and x n is the predicted value underthe sample. w n is the weight that can be adjusted. The weightvalue of the sensor will be lowered when it is classiﬁed at theend, and the weight value is not affected by Back Propagation(BP). This function can be realized by conﬁguring a weight list. The convolutional neural networks which is to extract atime series with a certain width is a general method for pro-cessing sequence features. The convolutional neural networkextracts the distribution characteristics of the sequence values,which can reﬂect the nature of the data processed by the sen-sor. The LSTM network can learn the characteristics, and thespeciﬁc network structure is shown in Fig 4. l n = − w n [ y n × log x n + (1 − y n ) × log(1 − x n )] (6)The parameters of the fusion algorithm is particularly im-portant when the classiﬁcation results of the two methods arecombined using the decision-level fusion method. In the esti-mation of variable length sequences, the choice of time widthis more important, which directly determines the validity ofthe features of the convolutional network. Moreover, the pa-rameters of different sensors are also different. Since it is adecision-level fusion[21], the features in the selection of timewidth may not be obvious. However, the network has a certaintime width range which is concluded by personal. The timewidth is summarized in the experiments based on the charac-teristics of the network, generally between 64-bit width and128-bit width. The experimental UWB radar data in this paper comes fromMATLAB generated radar data simulator. We adopt a total1000 samples of randomly generated waveform data with each1s length. 1000 infrared images of human bodies with differenthuman postures can be captured by a passive thermal infraredcamera. The weak acoustic signal can also use a weak audiocollector to collect the weak sound of the tapping of differentobjects with 1000 samples in 1s length.Through multiple neural network training, the single vari-able method is used to set different parameters, including thewindow width of the smoothing layer, the parameters of thelost layer, the size of the convolutional layer convolution ker-nel, the number of layers of the LSTM, the LSTM size of thefusion stage, and the whole connection size, etc. The purposeis to observe the best network state under different parame-ters. Simultaneously, we use BCE loss function and Adam op-timizer to train the neural network with momentum parameterto enhance the training speed. The training uses batch tech-nology to process multiple samples, which can speed up theconvergence rate of neural network.

Data preprocessing is performed using the methods de-scribed herein respectively. The transmitted signal of the UWBradar signal and the target echo signal can form a dual chan-nel, which can be classiﬁed by the neural network. AOE-HOGfeatures is extracted of the infrared video and they are classi-ﬁed to obtain the probability value of the infrared video withSVM. The acoustic waves uses are ampliﬁed to separate thefeature by wavelet transform and ICA method. Then the fea-tures are extracted by a convolutional neural network to obtainig. 4: decision-level data fusion neural networkFig. 5: Train loss of different neural network parametersthe probability value of the ﬁnal life detection. The obtainedsequence of the probability values are smoothed by the win-dow with width H, so as to concatenate the smoothed proba-bility value sequence and the sequence of probability valuesinto a two-dimensional vector. & Analysis

Fig 5 and Fig 6 shows the results of the network componentswith different network parameters. We set the network struc-ture with the smooth layer width of 5, Conv1d kernel size of 3,Drop out of 0.8, and LSTM layer of 3. After concatenating theresult of the sensors, two LSTMs are used to output signals to afully connected network, which is structure of dense

64 32 16 .In order to compare the impact of different parameters on ourproposed network, we set the number of layer3, layer4, layer5as 3, 4 and 5 which means LSTM layers before concat the dataof the sensors, the convolution kernel size conv1, conv2 as 1,3, the dropout rate drop , drop as . . , the smoothwindow size smooth5, smooth10 as , the loss function asMSE, the optimizer as Adam. The experiment shows that thenetwork structure is more reasonable according to the resultsof different parameter shown in Fig 5 and Fig 6. Data preprocessing:

The probability value of each sensoroutput is taken as the sample sequence by taking the ﬁrst Gvalues of the data with G width, and the samples are sequen- Fig. 6: Test loss of different neural network parameterstially backward. For the sequence of length 1000, (1000-G)2*G vector sizes can be taken. The sample sequence can beused for subsequent convolution calculation. By taking thesequence value of G width as a sample, the relationship be-tween sequences near the predicted sample value can be wellanalyzed, which is consistent with the characteristics of con-volutional neural network and LSTM neural network.

Training:

This experiment uses dual Tesla K80 GPU as theexperimental platform, and it takes one hour to complete theexperiment on the Pytorch environment which contains epoch20, batch 16. The probability value vector G*2 of the threesensors is used, where G is the step size in the sequence ofprobability values. After 10 epoches of training, the neuralnetwork will slowly reach a convergence state, and also it isbasically the ﬁne-tuning phase of the network after 10 epoches.After each epoch of training is completed, the entire data se-quence is re-scrambled, and the gradient is reduced by batchtechnology, so that the characteristics of the data can be fullylearned.

Feature extraction:

The neural network can extract the dis-tribution characteristics of the two-channel data by a convolu-tional network. Adopting the form of dual channel, it can notonly learn the envelope distribution of the smoothed data, butalso learn the detailed distribution of the data before smooth-ing, which can generate a feature map with distributed featuresig. 7: probability value sequence ﬁttingby integrating the features map of the two aspects. The ob-tained feature map is input into the LSTM*3 neural network,whose hidden state space size is G, to learn the characteris-tics between the time domains of the data and output the timedomain feature map.

Feature synthesis and probability output:

The three timedomain feature maps output above are concatenated togetherand passed through the two LSTM networks. The LSTM has ahidden state space of 3*G. The ﬁrst network can integrate thetime domain characteristics of the three feature maps, and thesecond LSTM. The network only outputs the last state vector(2*G). The state output of the last output is output through thefully connected network. The fully connected network adoptsa three-layer architecture with widths of 2*G, G, and 0.5*G.We use the BEC loss function which is more suitable for thetwo classiﬁcation operation. The probability value of the fullyconnected network output can be used for the two classiﬁca-tion operation. At the same time, the advantage of the neuralnetwork is that the back propagation algorithm can be used toadjust the weight of the whole network. The weighting of BPmakes fusion more intelligent.

Evaluation metrics:

A comparison chart is drawn betweenthe predicted value and the real value shown in Fig 7. The redline is the continuous waveform of the predicted value, and thegreen line is the waveform of the ground truth value. It can beseen from the waveform comparison that the neural networkbasically ﬁts the waveform of the sample data. The character-istics of the envelope, as well as the detailed trend of the data,basically meet the requirements of multi-sensor data ﬁtting.

In this paper, the CNN+LSTM neural network model is pro-posed for decision-level sensor data fusion. The sensor cred-ibility concept is introduced into the sensor, applied to thesensor fusion, and the G length sequence vector input neu-ral network is used for processing. The sensor fusion per-formance and scalability can enhance the robustness of thesystem. Firstly, CNN+LSTM is applied to UWB radar sen-sor feature extraction and probability calculation. The reli-ability of UWB radar life detection is directly calculated byneural network. Then the credibility value is used when en- tering decision-level neural network input. The smooth andnon-smoothed signal input into the dual-channel CNN networkcan make full use of the temporal and spatial characteristics ofthe data to regress the input data and to obtain the desired re-sult. It turns out that the result is in line with expectations.Finally, the three sensor conﬁdence values are processed bythe CNN+LSTM network to obtain the ﬁnal fusion life detec-tion probability value, and the value can be used to determinewhether there is vital sign information. It can be applied to mo-bile environments such as search robots and search and rescueunmanned sets.

References [1] Jordan T H. Earthquake predictability, brick by brick.

Seismolog-ical Research Letters , 2006, 77(1): 3-6.[2] H. B. Hu et al., A Review of Research of Life Detecting Radar,

Applied Mechanics and Materials , Vols. 519-520, pp. 1139-1143,2014[3] Liang X, Zhang H, Lyu T, et al. Ultra-wide band impulse radarfor life detection using wavelet packet decomposition[J].

Physi-cal Communication , 2018.[4] Seki M, Fujiwara H, Sumi K. A robust background subtractionmethod for changing background//

Applications of Computer Vi-sion, 2000, Fifth IEEE Workshop on . IEEE, 2000: 207-213.[5] Rinen A, Oja E. A fast ﬁxed-point algorithm for independentcomponent analysis.

International Journal of Neural Systems ,2000, 10(01):1-8.[6] Shukri S, Kamarudin L M. Device free localization technologyfor human detection and counting with RF sensor networks: Areview.

Journal of Network and Computer Applications , 2017,97: 157-174.[7] Liu B, Wang L, Liu M, et al. Lifelong Federated Reinforce-ment Learning: A Learning Architecture for Navigation in CloudRobotic Systems. arXiv preprint arXiv :1901.06455, 2019.[8] Dai S, Zhu F, Xu Y Y, et al. Vital signal detection method basedon principal component analysis and empirical mode decompo-sition for ultra wideband radar.

Dianzi Xuebao(Acta ElectronicaSinica) , 2012, 40(2): 344-349.[9] CUI Li-Hui, ZHAO An-Xing, NING Fang-Zheng. Radar VitalSign Detection Method Based on the EMD and BP Algorithm.

Computer Systems & Applications , 26: 217-222.[10] Lee K C, Ou J S, Fang M C. Application of SVD noise-reductiontechnique to PCA based radar target recognition.

Progress InElectromagnetics Research , 2008, 81: 447-459.[11] Ye M A , Qing C , Moufa H U . Research on Infrared Human De-tection from Complex Backgrounds.

Infrared Technology , 2017.[12] Burges C J C. A tutorial on support vector machines for patternrecognition.

Data mining and knowledge discovery , 1998, 2(2):121-167.[13] Lipton A J, Fujiyoshi H, Patil R S. Moving target classiﬁcationand tracking from real-time video//

Applications of Computer Vi-sion , 1998. WACV’98. Proceedings., Fourth IEEE Workshop on.IEEE, 1998: 8-14.[14] Hyv rinen A, Hoyer P O, Inki M. Topographic independent com-ponent analysis.

Neural computation , 2001, 13(7): 1527-1558.[15] Wang C B, Guo Y, Wang J. Analysis of victim location sys-tem signal in earthquake disaster based on acoustic and seis-mic wave detection.

Chinese Journal of Engineering Geophysics ,2005, 2(2): 79-83.[16] Qiu L, Liu T, Lin N. Data aggregation in wireless sensor net-ork based on deep learning model.

Chinese Journal of Sensorsand Actuators , 2014, 12: 1704-1709.[17] Zhi Z , Qinghai Y . TARGET RECOGNITION METHODBASED ON BP NEURAL NETWORKS AND IMPROVED D-S EVIDENCE THEORY.

Computer Applications & Software ,2018.[18] Bae S H, Choi I, Kim N S. Acoustic scene classiﬁcation usingparallel combination of LSTM and CNN //Proceedings of the De-tection and Classiﬁcation of Acoustic Scenes and Events 2016Workshop (DCASE2016) . 2016: 11-15.[19] Wang J, Yu L C, Lai K R, et al. Dimensional sentiment analy- sis using a regional CNN-LSTM model//

Proceedings of the 54thAnnual Meeting of the Association for Computational Linguistics(Volume 2: Short Papers) . 2016, 2: 225-230.[20] Xiao-Ling W , Ming L , Xin Y , et al. Atrial ﬁbrillation detectionbased on multi-feature fusion and convolution neural network.

Laser Journal , 2017.[21] Schlosser J, Chow C K, Kira Z. Fusing lidar and im-ages for pedestrian detection using convolutional neural net-works //Robotics and Automation (ICRA), 2016 IEEE Interna-tional Conference on. IEEE//Robotics and Automation (ICRA), 2016 IEEE Interna-tional Conference on. IEEE