Large-Scale Spectrum Occupancy Learning via Tensor Decomposition and LSTM Networks
LLarge-Scale Spectrum Occupancy Learning viaTensor Decomposition and LSTM Networks
Mohsen Joneidi ∗ , Ismail Alkhouri ∗ , and Nazanin Rahnavard Department of Electrical and Computer EngineeringUniversity of Central FloridaEmail: { ialkhouri@knights., mohsen.joneidi@, nazanin@eecs. } ucf.edu Abstract —A new paradigm for large-scale spectrum occupancylearning based on long short-term memory (LSTM) recurrentneural networks is proposed. Studies have shown that spectrumusage is a highly correlated time series. Moreover, there is acorrelation for occupancy of spectrum between different fre-quency channels. Therefore, revealing all these correlations usinglearning and prediction of one-dimensional time series is not atrivial task. In this paper, we introduce a new framework for rep-resenting the spectrum measurements in a tensor format. Next, atime-series prediction method based on CANDECOMP/PARFAC(CP) tensor decomposition and LSTM recurrent neural networksis proposed. The proposed method is computationally efficientand is able to capture different types of correlation withinthe measured spectrum. Moreover, it is robust against noiseand missing entries of sensed spectrum. The superiority of theproposed method is evaluated over a large-scale synthetic datasetin terms of prediction accuracy and computational efficiency.
Index Terms —Spectrum occupancy learning, Tensor CP de-composition, LSTM time-series prediction.
I. I
NTRODUCTION
Spectrum occupancy learning (SOL) aims to extract spec-trum usage patterns at each frequency band over time. Thelearned model of spectrum occupancy facilitates the function-ality of dynamic spectrum access. Spectrum sensing, optimalchannel selection for opportunistic spectrum access, and re-source allocation are some tasks that can be performed moreefficiently by the prediction of spectrum usage [1].The SOL problem can be regarded as time series learningand prediction and its performance mainly depends on theunderlying model for the time series analysis. Many statisticalmodels and methods for spectrum usage prediction havebeen proposed in the last decade [2]. Auto-regressive models,Markov models [3], [4] and neural networks [5], [6] are ex-ploited as the core models for spectrum time-series prediction.However, spectrum usage is a non-stationary process whosecharacteristics are time-dependent [7]. Other factors such asusers’ mobility and diverse demands of users make this processmore complex. To overcome this challenging problem, deeplearning methods are successfully implemented for capturingspectral usage patterns [8], [9]. Long short-term memory(LSTM) and convolutional neural networks (CNNs) are popu-lar models for learning deep networks in various applicationssuch as computer vision and pattern recognition problems [10], * Indicates shared first authorship. This material is based upon worksupported by the National Science Foundation under Grant No. CCF-1718195. [11]. However, these methods are still challenging for large-scale learning of spectrum time series. The correlation of spec-trum occupancy w.r.t time could be within a very large range.For example, averaged spectrum occupancy may correlate tothat of one hour ago, but some network activities are daily orweekly [12]. Thus, spectrum occupancy at one time could berelated to spectrum occupancy of one day or a week ago aswell. Likewise, there might exist some spectrum patterns evenin a larger scale over time. While conventional time-seriesprediction methods fail to reveal correlations in large lags,LSTM is able to capture these patterns. However, there aretwo issues in the large-scale data scenario. First, learning andprediction of an extremely long time series implies capturingall the spectrum correlations efficiently and the computationalburden of learning and updating the LSTM model may not betractable for online tracking of spectrum occupancy. Second,dealing with missing entries in the learning phase is inevitablefor a real data sequence as it affects the prediction accuracyin the test phase. We propose to utilize tensor-based datacompletion methods that had attracted many attentions for dataprocessing in the presence of missing entries [13].This paper proposes a new high-dimensional structure forsensed spectrum data in order to improve accuracy and scala-bility of LSTM for large-scale SOL. A joint problem of datainterpolation and extrapolation (completion and prediction) isintroduced. Tensor CP decomposition provides a reliable low-dimensional representation of data, and LSTM performs a fastprediction on the lower-dimension data (decomposed factors).The correlations in the matrix-based representation witha long lag are vulnerable to be forgotten. However, thesecorrelations can be identified in a much smaller lag in thethird dimension of a tensor. In the present paper, tensor-basedrepresentation of time series is exploited in order to extractsome basic time series known as CP factors of a tensor. Thesefactors are robust against noise and missing entries. Large-scale prediction of all time series over long-time dimensiononly requires a prediction of CP factors of the measured tensor.This significant advantage can be considered as a big datareduction technique.The main contributions of this paper are summarized asfollowing: • A novel time-series prediction framework is proposedbased on tensor decomposition and LSTM networks. Ourframework can be employed for large-scale spectrum a r X i v : . [ ee ss . SP ] M a y ccupancy learning among many other large-scale time-series prediction applications. • Computational burden is decreased by solely performingprediction on low-dimensional CP factors rather thanhigh-dimensional raw data. • The problem of missing samples in the time-series pre-diction is addressed using tensor completion techniques.Throughout this paper, X denotes a three-way tensor, X denotes a matrix. Mode-n fiber of a tensor is a vector obtainedby fixing all modes except the n th mode and Mode-n matri-cized version of tensor is denoted by X ( n ) . x and x representa vector and a scalar, respectively. Hadamard product, outerproduct, and Khatri-rao product are denoted as ∗ , ◦ , and (cid:12) ,respectively [14].II. B ACKGROUND AND S YSTEM M ODEL
In this section, the prerequisite background is presented,then the system model for spectrum aggregation is explained.
A. Tensor CP Decomposition
A Tensor is a multi-dimensional array. Since their introduc-tion, tensors have been utilized in various fields as they bringa concise mathematical framework for formulating and chal-lenging problems involving high-dimensional data or big dataespecially in signal processing [15]. The CP decompositionfactorizes a 3-dimensional tensor
X ∈ R F × T × N of rank R into a sum of rank-1 tensors which can be represented as [14] X = R (cid:88) r =1 a r ◦ b r ◦ c r ∆ = < A , B , C >, (1) where, a r , b r , and c r are the CP factors of the r th component and the r th column of factor matrices A , B and C , respectively. In other words, A = [ a a . . . a r ] ∈ R F × R .Similarly, B ∈ R T × R and C ∈ R N × R are defined.The tensor X can be matricized as [14], X (1) = A ( C (cid:12) B ) T , X (2) = B ( C (cid:12) A ) T , (2) X (3) = C ( B (cid:12) A ) T . A powerful property of high-order tensors is that their rankdecomposition is unique under milder conditions compared tomatrices [16]. The interesting characteristics of tensors haveattracted researchers in communication systems for channelestimation and blind coding in MIMO systems [17], [18].Computing the CPD used for this paper is done by thealternating least squares (ALS) method proposed by Carrol,and Harshman [19], [20]. The goal is to calculate a CPD with R components that best approximate X , i.e., to obtain min ˆ X (cid:107)X − ˆ X || s.t. ˆ X = R (cid:88) r =1 = a r ◦ b r ◦ c r . (3) The ALS algorithm fixes B and C to solve for A , then fixes A and C to solve for B , and then fixes A and B to solve for C [14]. We refer to this algorithm as the plain CP algorithm. Channel 1 ooo
Channel F ooo
Day 1 Day 2
Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10 ... dayminutemeasurement
Channel 1 . . .
Channel F .. . dayminute measurement c h a nn e l short-time (a) (b) Fig. 1:
Comparing two representing methods of F time series. (a)Matrix-based representation. (b) Tensor-based representation. B. LSTM Network
Neural networks have been recognized as powerful tech-niques for spectrum pattern learning [21]. Similar to otherneural network structures, LSTM consists of an input layer,hidden layer(s), and an output layer. LSTM network wasintroduced by Hochreiter and Schmidhuber in 1997 as anadvanced type of recurrent neural networks (RNNs) [22]. RNN(or vanilla RNN) provides the feature of internal memorymaintenance i.e., it saves the information of the previoustime step. This method introduced the problem of gradientexplosion which means that the network overwrites its memoryin an uncontrolled manner. The main advantage of LSTM isto fix the issue encountered in the conventional RNNs byadding an adaptive memory unit, which is its key component.This adaptive memory unit controls saving dominant samplesand/or forgetting obsolete data. This feature enables LSTM totrack information over longer periods of time. The mathemat-ical computation of one memory cell is given in [22].
C. System model
Consider that we have a frequency spectrum sensor, thatsaves the power spectral density (PSD) of an RF band of F frequency bins at T times a day. Therefore, we will have amatrix of size T × F for each day of recording. Additionally,assume data is recorded for N days. Corresponding to onefrequency bin f , there exists a matrix of length T × N that presents occupancy changes over all times. Similarly,corresponding to one instant of time t , a matrix of length F × N represents the occupancy of spectrum over all channelsand days in a specific time of day. After N number of days(number of frontal slices), the occupancy of the f th frequencychannel at t th time slot is the subject of prediction for theupcoming day. The proposed data arrangement is shown inFig.1.To forecast the values of next time steps of a sequence, weutilize a predictor that trains a regression network. Four typesof training networks are used in this paper, the auto-regressivemodel (AR), support vector machines (SVM), convolutionalneural networks (CNN), and LSTM.II. P ROPOSED M ETHOD
The measured tensor consists of
T F number of time seriesthat has N values over long time (days) each. Prediction ofevery time series independently comes with two issues; (i)The long lag correlation between time series is neglected, thusnoise and missing entries can easily affect prediction severely.(ii) Prediction of each time series implies learning a networkwhich requires a large amount of computational burden.Tensor decomposition learns a few principle factors foreach way, such that all fibers in the corresponding way ofthe tensor can be reconstructed using the linear combinationof the learned fibers. Tensor CP representation is a concisemodel, and it is robust against perturbations and missingentries. It is shown that a low-rank tensor can be recoveredfrom a small number of entries using CP decomposition. Inother words, the CP factors of the original tensor and the CPfactors of the partial and noisy replica of the original tensorare close to each other [23]. These attractive characteristicsof structured data in a high-dimensional tensor motivate usto employ tensor CP decomposition for dynamic spectrumcompletion and prediction.Consider a rank-1 tensor X ∈ R F × T × N with a thirddimension over long time variable n in Fig. 2. This tensorconsists of F T time series (fibers) alongside its third way.Because X has rank 1, all these time series are a scale of avector c , which is a basic time series. This vector is brokendown into two parts, the given part, c L , which corresponds tothe known part of tensor, and the unknown part, c P , whichcorresponds to the part of the tensor to be predicted. Predictionof all unknown variables of tensors is equivalent to predictionof c P . For a general rank- R tensor, there exist R basic timeseries that span the space of all fibers of the tensor in thethird way. Thus, the prediction of R temporal factors enablesus to predict any time series of the tensor. Suppose our source = = 𝑐 𝐿 𝑐 𝑃 𝑎 𝑏 Rank-1 tensor
Rank-1 matrix modulated with a temporal factor 𝑎 ∘ 𝑏 ∘ 𝑐 𝑇 𝑐 𝑃 𝑎𝑏 𝑇 = = 𝓧 Fig. 2:
A rank-1 tensor is the outer product of vectors and it canbe cast as the modulation of a rank-1 matrix with a temporal pattern. of data is dynamic, therefore, obsolete data might degradethe result of prediction. To tackle this problem, only recentslices are considered for learning. The number of slices foreach epoch of prediction is referred as the length of training, N L . Likewise, we define the length of prediction, N P , where N = N P + N L is equal to the size of third dimension of theunderlying tensor. The proposed tensor-based prediction solvestwo following consecutive problems to predict the unknownentries of the tensor over time: ( A , B , C L ) = argmin A , B , C L (cid:107)X L − < A , B , C L > (cid:107) F , (4a) X P = < A , B , f ( C L , Ω) >, (4b) in which, f ( . , . ) represents a model for time-series predictionand Ω is the set of model’s parameters. We will investigatethe effect of the prediction model on the performance of thewhole framework. AR, SVM, CNN, and LSTM are studiedas core models for prediction. However, our main proposedalgorithm is LSTM-based. Fig. 3 shows the block diagram ofthe proposed prediction algorithm. CPD
LSTM 𝐶 𝐿 𝐵 𝐴
CPR 𝓧 𝐿 Given
GivenPredicted 𝓧 𝐿 𝓧 𝑃 Fig. 3:
Block diagram of our tensor-based time-series prediction.
CP decomposition reveals the latent factors of data fromdifferent perspective, and LSTM predicts the long temporalfactors. The extracted factors using CP and extrapolated factorsusing LSTM can produce a tensor by CP reconstruction (CPR).Since tensor analysis considers multi-dimensional correla-tion of data, tensor completion is a state-of-the-art method fordata completion in many applications [24], [25]. The proposedtensor-based scheme can be extended to the joint completionand prediction in a straightforward formulation. Assume agiven incomplete tensor, X IL and a mask tensor with the samesize of the data tensor, M ∈ { , } F × T × N L . The entriescorresponding to are not measured. The incomplete tensorcan be completed using the formula given by X L = M ∗ X IL + ( − M ) ∗ < A , B , C L > . (5) In which, is a tensor with all entries equal to . Thegiven data is kept and the missed data is estimated using CP Algorithm 1
Time-series completion and prediction via tensorCP decomposition and LSTM prediction.
Input : Incomplete Tensor X IL , mask M . Output : Completed and predicted tensor ˆ X A , B , C L ← CP decomposition on X IL . While (The stopping criterion is not met) X L ← using Eq. (5).3: A , B , C L ← CP decomposition on X L . End C P ← LSTM on each column of C L C ← concatenate C L and C P ˆ X ← < A , B , C > . actors. However, updating the incomplete tensor enables thealgorithm to estimate a more accurate set of factors. Thus, CPfactors and missing entries can be updated iteratively. Alg. 1shows the proposed method for joint time series completionand prediction. The main loop of the algorithm completes datato find a more fitted set of CP factors. Then, LSTM predicts thelong-time factors, and the predicted time series are resulted byCP reconstruction. The tensor completion is performed usingiterative CP decomposition and data interpolation. However,the exploited CP does not use the information of mask, andthe mask is used only for data interpolation in (5). A modifiedversion of CP decomposition is presented in Alg. 2 that infusesthe information of mask in order to estimate CP factors of anincomplete tensor. The optimized CP for incomplete data canbe employed in line 3 of Alg. 1 instead of the plain CP inorder to estimate more accurate factors. Algorithm 2
Optimized CP for incomplete data.
Input : Tensor X , mask M , and rank, R . Output : CP factors of X A , B , C ← Plain CP decomposition on X [14]. While (The stopping criterion is not met)2: A ← minimize A (cid:107) M (1) ∗ ( X (1) − A ( C (cid:12) B ) T ) (cid:107) F B ← minimize B (cid:107) M (2) ∗ ( X (2) − B ( C (cid:12) A ) T ) (cid:107) F C ← minimize C (cid:107) M (3) ∗ ( X (3) − C ( B (cid:12) A ) T ) (cid:107) F End
SOL can be regarded as a learning-based detection wherethe problem is to detect whether a channel is occupied ornot. Our decision rule for detection is based on the outputof our proposed algorithm. Assume ˆ x ftn is the predictedspectrum value at frequency channel f , time t and day n .Two hypotheses are considered for spectrum occupancy statusfor this entry. S ( f, t, n ) = (cid:26) OCCUPIED if ˆ x ftn ≥ γ NOT OCCUPIED if ˆ x ftn < γ (6) In which, S ( f, t, n ) indicates the estimated occupancy statusat frequency channel f , time t and day n and γ is a thresholdfor operating the designed detector. As γ increases, both theprobability of detection and the probability of false alarmwill decrease. Receiver operating characteristic (ROC) of theproposed detector is able to find the optimum threshold toachieve the desired false alarm rate.IV. E XPERIMENTAL R ESULTS
In the following experiments, we assume that 20 frequencychannels are sensed. PSD of each frequency channel isrecorded 10 times an hour, i.e., there exist 240 measurementsfrom the spectrum for each mode-2 fiber. Moreover, it is as-sumed that the recording for 100 days is available. Therefore, F = 20 , T = 240 , and N = 100 .Synthetic dataset for time t at day n and frequency f follows the joint probability distribution of P ( t, n, f ) = P t ( t ) P n ( n ) P f ( f ) where each distribution is generated accord-ing to the below model, P t ( t ) = (cid:88) i =1 β i N ( τ i , σ i ) , (7a) P n ( n | j ) = N ( µ j , λ j ) , for j = n mod , (7b) P f ( f ) = U [1 , , ..., F ] . (7c) (7a) is the probability of spectrum occupation in a typ-ical day which is modeled by a Gaussian mixture model(GMM) with three peaks at 3pm, 6pm, and 9pm. Parameters { β i , τ i , σ i } are designed to satisfy the desired pattern ofGMM . The conditional probability of occupancy over daysfollows (7b). The condition determines that n corresponds towhich day of week. The parameters { µ j , λ j } are designed suchthat at Mondays, Tuesdays, Wednesdays, and Thursdays, thespectrum is more occupied than Fridays, and Friday is busierthan the weekend [8]. In addition, there is no preferencefor frequency occupation of a user which leads to a uniformdistribution with equal probabilities over all frequency bins,which is employed in (7c). This model is inferred fromprevious work [12].Selected LSTM parameters are 4 hidden layers with 4 uniteseach. Learning rate is 0.05 and the number of epochs is 300with ADAM optimizer. Intel Corei7 CPU with 4.20GHz and8 GB RAM is used for performing simulations on MATLAB2018b.The CPD-ALS algorithm determines the factors of the ten-sor numerically by solving alternating optimization problems.Calculating CP rank of a tensor is an NP-hard problem.However, it is upper bounded by the following inequality [14], Rank ( X ) ≤ min ( F T, F N, T N ) . A practical solution for finding rank is to start with a lownumber, compute the normalized reconstruction error, andincrease it as needed. Normalized error is obtained as afunction of rank as follows, e cpd ( R ) = ||X − ˆ X ( R ) || F ||X || F . (8) In which, || . || F denotes matrix Frobinious norm and R takesvalues from 1 to a maximum rank and ˆ X ( R ) is the rank- R approximation of X optimized by a tensor decomposition al-gorithm. The goal is to select the lowest rank that approximates X . The effect of rank for training the basic time series willbe investigated later.In this experiment, results of the proposed method is ex-hibited. The synthesized data is organized into a F × T × N tensor. In which F = 20 (20 frequency bins), T = 240 (240measurements per days), and N = 100 (100 days). with rank10, CP decomposition provides A ∈ R × , B ∈ R × ,and C ∈ R × . In order to evaluate prediction performance,the underlying tensor is broken into two tensors, (i) thelearning tensor, X L , and (ii) the test tensor, X P , that is thesubject of prediction. In this experiment N L = 80 days are β = . , β = . , β = . , τ =
150 (3 PM ) , τ =
180 (6 PM ) , τ = PM ) , and σ i =
20 (2 hour ) µ = µ = µ = µ = 1 , µ = 0 . , µ = µ = 0 . , and λ j = 0 . µ j ABLE I:
Normalized Prediction Error and Processing Time (sec)
Method CPD time Learning time Total time Error%AR [26] N/A 55.12 55.12 33.55AR+CPD 3.71 4.23 7.94 21.83SVM [21] N/A 1202.21 1202.21 23.78SVM+CPD 3.71 20.52 24.23 16.94CNN [8] N/A 496.44 496.44 22.40CNN+CPD 3.71 15.87 19.58 17.81LSTM [27] N/A 2389.96 2389.96 23.71LSTM+CPD 3.71 12.01 15.72 used for learning and N P = 20 days are considered forprediction.The obtained long-time CP factors, C L ∈ R × areexploited to predict C P ∈ R × . Each column of C L isa pseudo-time series that is employed for prediction of C P independently. Predicted values from AR, SVM, CNN, andLSTM training networks are computed. We also calculated theprediction of the matrix-based data using the aforementionedtraining methods to demonstrate the impact of utilizing CPD.Numerical comparison with other methods is presented inTable I. Tensor-based methods improve prediction accuracyas well as save computational burden.Employing LSTM for prediction of CP factors exhibits thebest results, and it decreases computation cost comparison tothe plain LSTM on the set of raw time series. The normalizederror is computed using the following rule, e p = (cid:80) ( x i − ˆ x i ) x i , where, x i and ˆ x i are the actual and the predicted values in thetime series.It can be observed that each prediction techniqueis improved by employing CPD. Our proposed method,LSTM+CPD, returns the best performance in terms of thenormalized prediction error. In general, LSTM outperformsmethods based on AR, SVM, or CNN [26], [21], [8]. Itis worthwhile to notice that our proposed method predictsspectrum occupancy more accurately than performing LSTMon raw time series data [27]. On top of the enhanced predictionerror, CPD achieves a massive data reduction. Table I demon-strates the processing time for each method and illustrates thatexploiting CPD is able to diminish the total running time ofprediction rigorously.In the next experiment the proposed method, Alg. 1, isemployed for missing spectrum recovery when a portion ofspectrum measurements is missing. To this aim, the wholetensor is assumed to be incomplete. Therefore, random mea-surements from a F × T × N tensor are available to recoverthe whole tensor. The proposed spectrum completion algorithmrequires performing CPD in each iteration of completion. It isshown that employing the modified CP for incomplete data,Alg. 2, is more effective for missing spectrum recovery. Eachiteration of data completion using the optimized CP needsmore computation. however, the number of needed iterationsfor the modified CP is much less than the plain CP. Fig. 4 (a) (a)
10 20 30 40 500.150.250.350.45 (b)
Fig. 4:
Normalized completion error using the proposed method inAlg. 1 (a) Over iterations. (b) For different missing ratios. (a) (b)
Fig. 5:
Normalized error of prediction vs. the assumed rank for theunderlying tensor. As the rank increases the learning error decreases.However, increasing rank causes over-learning for prediction. Thus,prediction error is not necessarily decreasing. shows the performance of our proposed time series completionmethod using plain CP and the introduced CP versus iterationof data completion in Alg. 1.The performance of our proposed joint completion andprediction problem is presented for missing ratio in the rangeof and percent of data. Plain CP algorithm and themodified CP are compared for performing Alg. 1 to solvethe joint problem. In this experiment, time series of daysare considered for learning and days for prediction. Thelearning tensor, X L , is assumed to have missing entries. As itcan be seen in Fig. 4 (b), our proposed algorithm successfullycompletes data in terms of the normalized error and predictstime series using LSTM. As previously stated, the modifiedCP outperforms plain CP in presence of missing entries. Theprediction error is close to that of exploiting all data forlearning that is presented in Table I. For example, in presenceof missing entries for learning, the prediction error is . . This number is close to . which is obtainedby learning using the full tensor.Each component of CP decomposition learns some patternsof data. Selection of the rank equal to R provides R set offactors that reconstruct the learning tensor. As the assumedrank increases, more details about learning tensor are capturedand the reconstruction error decreases. Fig. 5 (a) showsreconstruction error of the learning tensor versus the assumedrank. However, learning fine details does not help prediction.Thus, the imposed rank can not be a large number arbitrarily.Fig. 5 (b) shows the performance of prediction using LSTMversus the selected rank of CP for decomposition of thelearning tensor. As shown after rank 10, the normalized errorof prediction is not decreasing by increasing rank.The last experiment of this paper shows the performance of Fig. 6:
ROC of the proposed detector in Eq. 6. spectrum occupancy detection. Two hypotheses are consideredbased on Eq. (6).The detection performance is determined us-ing a ground truth of spectrum occupancy from the synthesizeddata. Our proposed spectrum prediction results in a value forspectrum in each channel over time. The value turns into adecision rule by Eq. (6). Probability of detection, P D , vs.probability of false alarm, P F , are plotted by applying differentvalues for the threshold. Utilization of AR, SVM, CNN, andLSTM on the tensor-based prediction is compared by theirROC graph in Fig. 6. LSTM exhibits better performance fordetection of free channels. It means that with a fixed falsealarm rate, the probability of detection using the proposedLSTM-based method is higher than the other methods.V. C ONCLUSION
In this paper, a combination of tensor decomposition andLSTM time-series prediction is proposed as a new paradigmfor large-scale spectrum occupancy prediction. The measuredspectrum data is organized into a 3-way tensor. The CPD-ALS algorithm is performed to obtain CP factors for big datareduction and learning reliable patterns of data. The LSTMnetwork is then utilized to predict CP factors in order toestimate future spectrum occupancy patterns over time and forall frequency channels. Employing LSTM as the core predictorof CP factors outperforms other schemes such as AR, SVM,and CNN. The performance of handling missing data on thesensed spectrum illustrated robustness of CP factors againstperturbations on the learning information.R
EFERENCES[1] R. Li, Z. Zhao, X. Zhou, G. Ding, Y. Chen, Z. Wang, and H. Zhang,“Intelligent 5g: When cellular networks meet artificial intelligence,”
IEEE Wireless communications , vol. 24, no. 5, pp. 175–183, 2017.[2] J. Zhang, G. Ding, Y. Xu, and F. Song, “On the usefulness of spectrumprediction for dynamic spectrum access,” in
Wireless Communications& Signal Processing (WCSP), 2016 8th International Conference on .IEEE, 2016, pp. 1–4.[3] H. Eltom, S. Kandeepan, Y. C. Liang, B. Moran, and R. J. Evans, “HMMbased cooperative spectrum occupancy prediction using hard fusion,” in
Communications Workshops (ICC), 2016 IEEE International Conferenceon . IEEE, 2016, pp. 669–675.[4] J.-W. Wang and R. Adriman, “Analysis of opportunistic spectrumaccess in cognitive radio networks using hidden markov model withstate prediction,”
EURASIP Journal on Wireless Communications andNetworking , vol. 2015, no. 1, p. 10, 2015. [5] S. Bai, X. Zhou, and F. Xu, “Spectrum prediction based on improved-back-propagation neural networks,” in
Natural Computation (ICNC),2015 11th International Conference on . IEEE, 2015, pp. 1006–1011.[6] A. A. Eltholth, “Spectrum prediction in cognitive radio systems us-ing a wavelet neural network,” in
Software, Telecommunications andComputer Networks (SoftCOM), 2016 24th International Conference on .IEEE, 2016, pp. 1–6.[7] A. D’Alconzo, A. Coluccia, F. Ricciato, and P. Romirer-Maierhofer,“A distribution-based approach to anomaly detection and applicationto 3g mobile traffic,” in
Global Telecommunications Conference, 2009.GLOBECOM 2009. IEEE . IEEE, 2009, pp. 1–8.[8] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular trafficprediction based on densely connected convolutional neural networks,”
IEEE Communications Letters , 2018.[9] L. Yu, J. Chen, and G. Ding, “Spectrum prediction via long short termmemory,” in
Computer and Communications (ICCC), 2017 3rd IEEEInternational Conference on . IEEE, 2017, pp. 643–647.[10] D. Yogatama, C. Dyer, W. Ling, and P. Blunsom, “Generative anddiscriminative text classification with recurrent neural networks,” arXivpreprint arXiv:1703.01898 , 2017.[11] Y. Fan, X. Lu, D. Li, and Y. Liu, “Video-based emotion recognitionusing cnn-rnn and c3d hybrid networks,” in
Proceedings of the 18thACM International Conference on Multimodal Interaction . ACM, 2016,pp. 445–450.[12] D. Willkomm, S. Machiraju, J. Bolot, and A. Wolisz, “Primary users incellular networks: A large-scale measurement study,” in
New frontiersin dynamic spectrum access networks, 2008. DySPAN 2008. 3rd IEEEsymposium on . IEEE, 2008, pp. 1–11.[13] B. Li, X. Zhang, X. Li, and H. Lu, “Tensor completion from one-bitobservations,”
IEEE Transactions on Image Processing , vol. 28, no. 1,pp. 170–180, 2019.[14] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,”
SIAM review , vol. 51, no. 3, pp. 455–500, 2009.[15] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalex-akis, and C. Faloutsos, “Tensor decomposition for signal processing andmachine learning,”
IEEE Transactions on Signal Processing , vol. 65,no. 13, pp. 3551–3582, 2017.[16] N. D. Sidiropoulos and R. Bro, “On the uniqueness of multilineardecomposition of n-way arrays,”
Journal of Chemometrics: A Journalof the Chemometrics Society , vol. 14, no. 3, pp. 229–239, 2000.[17] C. Qian, X. Fu, N. D. Sidiropoulos, and Y. Yang, “Tensor-based channelestimation for dual-polarized massive MIMO systems,” arXiv preprintarXiv:1805.02223 , 2018.[18] M. N. da Costa, G. Favier, and J. M. T. Romano, “Tensor modellingof MIMO communication systems with performance analysis and kro-necker receivers,”
Signal Processing , vol. 145, pp. 304–316, 2018.[19] J. D. Carroll and J.-J. Chang, “Analysis of individual differences inmultidimensional scaling via an n-way generalization of eckart-youngdecomposition,”
Psychometrika , vol. 35, no. 3, pp. 283–319, 1970.[20] R. A. Harshman, “Foundations of the parafac procedure: Models andconditions for an” explanatory” multimodal factor analysis,” 1970.[21] F. Azmat, Y. Chen, and N. Stocks, “Analysis of spectrum occupancyusing machine learning algorithms,”
IEEE Transactions on VehicularTechnology , vol. 65, no. 9, pp. 6853–6860, 2016.[22] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neuralcomputation , vol. 9, no. 8, pp. 1735–1780, 1997.[23] A. Wang, D. Wei, B. Wang, and Z. Jin, “Noisy low-tubal-rank tensorcompletion through iterative singular tube thresholding,”
IEEE Access ,vol. 6, pp. 35 112–35 128, 2018.[24] A. Montanari and N. Sun, “Spectral algorithms for tensor completion,”
Communications on Pure and Applied Mathematics , vol. 71, no. 11, pp.2381–2425, 2018.[25] J. Yang, Y. Zhu, K. Li, J. Yang, and C. Hou, “Tensor completion fromstructurally-missing entries by low-tt-rankness and fiber-wise sparsity,”
IEEE Journal of Selected Topics in Signal Processing , 2018.[26] A. Eltholth, “Forward backward autoregressive spectrum predictionscheme in cognitive radio systems,” in
Signal Processing and Com-munication Systems (ICSPCS), 2015 9th International Conference on .IEEE, 2015, pp. 1–5.[27] L. Yu, J. Chen, G. Ding, Y. Tu, J. Yang, and J. Sun, “Spectrum predictionbased on taguchi method in deep learning with long short-term memory,”