A Novel Motion Detection Method Resistant to Severe Illumination Changes
Sahar Yousefi, M.T. Manzuri Shalmani, Jeremy Lin, Marius Staring
AA Novel Motion Detection Method Resistant to Severe Illumination Changes
Sahar Yousefi
Mohammad T. Manzuri Shalmani Jeremy Lin Marius Staring
Sharif University of Technology, Tehran, Iran Leiden University Medical Center, Leiden, The Netherlands PJM Interconnection, Audubon, PA 19403, USA Delft University of Technology, Delft, The Netherlands
Abstract — Recently, there has been a considerable attention given to the motion detection problem due to the explosive growth of its applications in video analysis and surveillance systems. While the previous approaches can produce good results, an accurate detection of motion remains a challenging task due to the difficulties raised by illumination variations, occlusion, camouflage, burst physical motion, dynamic texture, and environmental changes such as those on climate changes, sunlight changes during a day, etc. In this paper, we propose a novel per-pixel motion descriptor for both motion detection and dynamic texture segmentation which outperforms the current methods in the literature particularly in severe scenarios. The proposed descriptor is based on two complementary three-dimensional-discrete wavelet transform (3D-DWT) and three-dimensional wavelet leader. In this approach, a feature vector is extracted for each pixel by applying a novel three dimensional wavelet-based motion descriptor. Then, the extracted features are clustered by a clustering method such as well-known k -means algorithm or Gaussian Mixture Model (GMM). The experimental results demonstrate the effectiveness of our proposed method compared to the other motion detection approaches from the literature. The application of the proposed method and additional experimental results for the different datasets are available at (http://dspl.ce.sharif.edu/motiondetector.html). Keywords- Motion detection, Dynamic texture, 3D-discrete Wavelet Transform, Wavelet leader. I. I NTRODUCTION
Over the past decade, the problem of motion detection is attracting more attention due to its wide range of applications in video surveillance, natural disaster investigation systems, and other areas. For this purpose, a wide variety of approaches to solve this problem has been proposed in the literature [1-12]. The proposed approaches in the literature can be divided into two categories: 1) spatial domain methods, and 2) frequency domain methods. In spatial domain approaches, spatiotemporal descriptors are often employed in order to model the motion patterns by ignoring the holistic motion patterns. St-Charles et al. [12] proposed Self–Balanced SENsitivity SEgmenter (SuBSENSE). In this method, the authors used a spatiotemporal local binary similarity pattern (LBSP) for characterizing the pixel representations in a nonparametric paradigm, which is tuned by pixel level feedback loops. In other words, this method tries to model the background using the pixel level feedback loops. Therefore a background tuning process is done for modeling the background. LBSP defines for each pixel p a neighbor set pN on the frames, and then assigns a binary pattern to the pixel p based on the differences of gray-levels between the neighbor pixels, pNq , and p . If the illumination variation were non-uniform (i.e. the gray-level of a sub-set of neighbors changed), the binary pattern will be changed. Therefore, LBSP is not robust under non-uniform illumination variation. Also, in order to regularize the process and eliminate the salt-pepper noise, SuBSENSE uses the morphological operations and median filter. These operations cause to eliminate far and tiny moving objects. Moreover, the background tuning process is usually slow and makes it difficult to adapt to the sudden illumination variations and burst motions [9]. Bianco et al. [10] exploited Genetic Programming and combined the state-of-art of the motion detection approaches in order to get the best solution. This method suffers from heavy computation burden. Moreover, there is no guarantee of finding the global maxima. On the contrary, frequency domain approaches can consider holistic motion pattern information [13]. In [14] and [15], the two-dimensional discrete wavelet transform (2D-DWT) is used for moving object detection. These methods compare 2D-DWT of the current frame with the previous frames to detect a motion. For compression, a threshold value is defined. These methods do not consider the intrinsic temporal dimension in wavelet coefficients computation and hence the results are sensitive to the predefined threshold. While good results can be achieved in the approaches mentioned earlier, accurate motion detection remains a challenging task due to the difficulties raised by illumination variations, occlusion, camouflage, burst physical motion, dynamic texture, and environmental changes such as those on climate changes, sunlight changes during a day, etc. In this paper, we propose a novel motion detection method which efines a pixel-based features descriptor based on three-dimensional discrete wavelet transform (3D-DWT) representations which can overcome the stated difficulties. By applying the well-known 2D-wavelet transform on an image, the approximation coefficient and the detail coefficients for three levels are computed for each of the three decomposition directions (horizontal, vertical and diagonal detail coefficients). By applying 3D-wavelet transform on a volume, the approximation coefficient and seven detail coefficients in seven different directions are computed. Therefore, frequency domain method can consider holistic motion pattern information. These coefficients in special directions can model the motion pattern across the time (the consequence frames). In this paper, we propose a new per-pixel motion descriptor based on 3D-DWT, which can obtain some feature vectors for modeling the motion pattern. There are some potential challenges in change detection algorithms. Camouflage identification is one of the challenging issues in motion detection. Camouflage is a situation for which motion detection is difficult because of the similarity between the color of dynamic objects and the background [16]. In [12] and [17] spatiotemporal binary features and color information are used for detecting the camouflaged foreground objects. Ramiez et al. [16] proposed a thresholding approach which tunes the values of the thresholds based on the analysis of the global Hue histogram of the frames. Due to its global character, this method cannot account for the frames which a color predominates on the scene locally. In our proposed method, which uses wavelet-based spatial frequency descriptors, we can achieve a high degree of insensitivity to camouflaged foreground objects. Burst physical motion detection, which was not considered seriously, is another challenging issue in motion detection problems. Moving escalators, swirling wheels, fans, swinging foliage are some of the examples of the burst physical motion. With regards to the background tuning approaches, such as [11, 12, 18], which update the background regions using the history of the pixels with a fixed learning rate, they are sensitive to burst motions [9]. Liang et al. proposed a frequency and speed adaptive background model for this purpose [3, 9]. In our proposed approach, the burst physical motion detection is handled due to its intrinsic frequency-based motion pattern descriptors. The dynamic texture issue is the last challenging issue which we consider. To our knowledge, none of the present approaches consider the dynamic textures as the moving regions. Due to the wavelet coefficient-based descriptors which describe motion patterns by their motion behavior, our proposed approach is able to tolerate the dynamic textures. In the proposed approach, a representation of spatial frequency motion pattern based on three dimensional low-pass and high-pass wavelet coefficient as well as wavelet leaders is provided first. Wavelet leaders overcome the problem of a large number of close-to-zero wavelet coefficients [19]. Then a set of pixel-based features over these wavelet coefficients are computed. After feature extraction, the vectors are clustered by using well-known k -means algorithm. The three main contributions of this paper are: (1) A robust and effective approach for motion detection which outperforms the previous methods for videos with difficult environment like changing weather conditions, illumination variations, camouflaged foreground objects, and others; (2) A novel method which detects the background motions based on their motion patterns such as burst physical motions, dynamic textures; (3) A spatial frequency per-pixel basis feature extraction method based on high-pass/low-pass wavelet coefficients and wavelet leader for achieving the proper motion detection results and preventing the jagged boundaries. The remainder of this paper is organized as follows: Section II describes an overview of the proposed method while Section III explains the three-dimensional wavelet transform and wavelet leader. Section IV describes the proposed wavelet-based motion descriptors. The efficiency evaluation of the proposed model through various video sequences in comparison with the previous approaches is presented in Section V. Finally, Section VI concludes. II. O VERVIEW OF THE PROPOSED METHOD
In this section, we provide an overview of the proposed approach. Figure 1 illustrates the flowchart of the proposed method. Also, the implementation of the proposed method, named tool, is freely available at our Motion Detection Webpage: http://dspl.ce.sharif.edu/projects/MotionDetector/MotionDetector.rar. A. Deinterlacing
Deinterlacing is a process in which the video sequences are converted into progressive sequences. The datasets which are used in this paper, used the spatiotemporal median filter for de-interlacing the videos. The spatiotemporal median filter is a non-linear filter which is obtained by extending the spatial median filter to spatiotemporal neighbors [20, 21]. B. To consider the motion and appearance of the video sequences, we use three-dimensional discrete wavelet transform (3D-DWT) method. As mentioned before, the 2D-wavelet transform computes the approximation coefficient and the three levels detail coefficients contain horizontal, vertical and diagonal detail coefficients for a two dimensional data (e.g. an image). Applying the 3D-wavelet transform on a three dimensional data (i.e. a volume) computes the approximation coefficient and seven detail coefficients in seven different directions. These coefficients in special directions can model the motion pattern across time in different directions. Hence, by using frequency domain, we can consider he holistic motion pattern information. In this paper, we propose a new pixel-based motion descriptor based on 3D- wavelet coefficients, which models the motion pattern in three
Input video Deinterlacing Feature extractionClassification
LH LL LH HL HH LLL LHHHHHLLH
LHL
HLL HLH
HHL
Output label field
Figure 1. Flowchart of the proposed method dimensional spaces. 3D-DWT can be coupled into two steps: 2D-spatial DWT and 2D-temporal DWT. In other words, the two-dimensional spatial and temporal transforms are done separately [22]. As will be described in the following section, the spatial DWT considers the holistic motion pattern information in space and the temporal DWT considers the holistic motion pattern information across time. In our proposed method, 3D-DWT is applied to the volumes (cubic patches). After each level of performing 3D-DWT on a volume, eight sub-cubic patches are created. This process is shown for one level in figure 1. C. Feature Extraction
In order to obtain the proper results and prevent the jagged boundaries, the feature vectors are described by a pixel-based process over the outputs obtained in the previous step. Zhang et al. proposed some feature vectors on 2D-DWT [23]. In our work, we propose eight feature vectors on 3D-DWT decomposition coefficients and 3D-wavelet leader which will be described in the following sections. D. Classification
After feature extraction, the feature vectors are classified into two different classes: motion vs. zero-motion . In we contrived two different approaches, which contain k-means and
Gaussian Mixture Model (GMM) , for this goal. In this paper, the results are reported using only k-means classification approach. III. W AVELET AND W AVELET L EADER
The 2D-DWT are widely used for moving texture detection such as smoke detection [24], fire detection [25], etc. Demonceaux et al. [26] proposed the combination of 2D-DWT and hierarchal Markov random fields for motion detection. In order to overcome the problem of temporal aliasing, an estimation of dominant motion on several image resolutions is obtained. In this work, the segmentation results exhibit obvious jaggedness of the boundaries. In order to prevent jagged boundaries, we propose a three-dimensional wavelet coefficient feature-based approach for motion detection. In the one-dimensional wavelet transform, a function tf can be analyzed by: s tt sstf , ,, , (1) where s t , s are the wavelet coefficients estimated by the inner product via, dtsxfs tt ,, . (2) Each of the wavelet coefficients s t , represents the resemblance of the function tf to the wavelet bases s t , at a specific translation and scale s . In this paper, we use Coiflet-like nearly symmetric orthogonal wavelet bases with magnitude and group delay flatness specification which has been proposed in [27]. Using the multi-scale wavelet decomposition scheme, a hierarchy of localized sub-functions at different spatial frequencies can be found. As mentioned before, a 3D-DWT can be coupled into two steps: 2D-spatial DWT and 2D-temporal DWT. As also previously mentioned, considering 3D-DWT into spatial and temporal dimensions leads to gain holistic motion pattern information through space and time. Suppose we have volume V , the 3D-DWT decomposes the volume V into (1) one low-frequency SLLL w , (2) seven strict high-frequency channels S w ~ , (3) and multiple non-strict high-frequency channels slo w , for Ss , which s is the scale set, S is the coarsest scale, o is the orientation set, where diagonalhorizontalvertical ,, , and l is the level set, where downup , . Decomposing a volume , into the wavelet coefficient set is a recursive process function s for Ss in which, sww s sLLLssLLLs ss , (3) and * x h0* x h1 ↓ x x y h0* y h1* y h0* y h1 ↓ y y y y z h0* z h1* z h0* z h1* z h0* z h1* z h0* z h1 ↓ z z z z z z z z LLL (v)w
LLL (v)subbandssubbandsw
LLH (v)w
LLH (v)w
LHL (v)w
LHL (v)w
LHH (v)w
LHH (v)w
HLL (v)w
HLL (v)w
HLH (v)w
HLH (v)w
HHL (v)w
HHL (v)w
HHH (v)w
HHH (v)vv
LHL HHLLLL HLL
HLL
HHLHHLLHLLHH HHH HHH
HLH
LH HHLL HL
L H
Figure 2: Three Dimensional Discrete Wavelet filters bank sHHHsHHLsHLHsHLL sLHHsLHLsLLHsLLLs ,,, ,,,, (4) Regarding filter concepts, Figure 2 illustrates the 3D-DWT filter banks for V . The figure indicates the sub-volumes of each filter for one scale and a volume V . In order to improve the robustness of descriptors of the wavelet coefficients, we use Wavelet leaders [28]. Wavelet leaders are another wavelet-based measurements which are defined in [28] for the first time. In the literature, wavelet leaders are used for various applications [13, 29-32]. Chen et al. used the wavelet leader pyramids for image quality assessment [30]. Pustelnik et al. proposed a combination of wavelet leaders and proximal minimization in order to segment textures in images [31, 32]. Ji et al. used wavelet leaders for dynamic texture classification [13]. Wavelet leaders are defined as the maximum magnitude of all the wavelet coefficients for local spatial neighborhood and scale neighborhood. As mentioned before, wavelet coefficients compute a large number of close-to-zero values. In this paper, in order to obviate the problems raised by wavelet coefficients, we propose three-dimensional wavelet leader pyramids. The wavelet leaders for a volume p , surrounding the pixel p , are defined as: , SslolOopsLeader . (5) Figure 3 represents p with size ( ). In this figure, the volume is defined by selecting ( ) neighborhood windows on a sequence of frames with the length 8, the red site is p and the other sites compose p . Figure 4 illustrates the three dimensional wavelet coefficients and the wavelet leaders of a volume for three scales based on equations (3) and (4). In this Figure the volumes in (3) and (4) are removed for brevity. Figure 3: p with size 8×8×8 on a sequence of frames, the red site is p and the blue sites are the neighborhood set p IV. W AVELET - BASED MOTION DESCRIPTORS
Using 3D-wavelet coefficients, we can model the holistic motion patterns in different directions. By applying 3D-wavelet transform on a volume or a cubic patch, the approximation coefficient and seven detail coefficients in seven different directions in a 3D space are computed. The high-pass wavelet coefficients can model the motion patterns over a sequence of frames. In this section, we propose a novel per-pixel motion descriptor based on 3D-DWT, which can model the motion pattern. Pixel-based feature vectors are helpful for achieving proper results and prevent jagged boundaries.
The descriptor vectors are defined as:
SsSs pr rss srwpf
11 2 , (6) where p is the central pixel on a neighborhood cube p , r shows the neighbors of p in p , and s r is defined as below, Ssrw Ssr ss ps , (7) in which S is the coarsest scale, s w demonstrates the wavelet sub-bands of the th s scale where LeaderHHHHHLHLHHLLLHHLHLLLHLLL ,,,,,,,, . (8) In equation (6), square root of sum of squares, i.e. Frame (3)LLH W (3)HHH W (3)LHH W (3)HLH W (3)LHL W (3)HHL W (3)HLL W (2)LLH W (2)LHL W (2)LHH W (2)HLL W (2)HHL W (2)HLH W (2)HHH W (1)LLL W (1)LLH W (1)LHL W (1)LHH W (1)HHH W (1)HLH W (1)HHL W (1)HLL (a) (b) Figure 4: (a) wavelet coefficients for , (b) wavelet leaders for pr rss rw , computes the Euclidean norm on a cubic-patch-size-dimensional space for the th s scale level of the wavelet sub-bands, which gives the ordinary distance from an origin to the patch vector. Considering averaging over multiple scale levels in equation (6) leads to apply the more significant wavelet components in finer scales which consequently causes noise reduction. In this paper, in order to compute the feature descriptors, we use S scales for the volumes with size ( SSS ). Figure 5 illustrates the feature vectors extracted from the wavelet coefficients and the wavelet leaders for a set of consecutive frames, using cubic patches with size ( ) and ( S ). As results indicate, the feature vectors extracted from high-pass wavelet coefficients contain sLHH w , sHLH w , and sLLH w , illustrate significant motion patterns. Hence, we use these feature vectors and the acquired wavelet leader from these features for modeling the motion patterns in three dimensional space. V. E XPERIMENTAL RESULTS
In this section, in order to evaluate the performance of the proposed approach, the results obtained are reported both qualitatively and quantitatively on video datasets including a variety of indoor and outdoor environments. The qualitative results are compared with various methods including CP3-online [9], IUTIS-3 [10], SUBSENSE [12], AAPSA [16], and MDT [33]. The quantitative results are reported by comparing them with the results of several unsupervised approaches including CP3-online [9], IUTIS-3 [10], SUBSENSE [12], AAPSA [16], C-EFIC [34], GMM|Zivkovic[35], CwisarDH [4], SOBS_CF [2], and AMBER [36]. We evaluate the proposed method on four datasets: 1. (a) w
LLH (b) w
LHL (c) w
LHH (d) w
HLL (e) w
HLH (f) w
HHL (g) w
HHH (h)w
LLL (i)Leader Figure 5: (a-g) Features extracted from high-pass wavelet coefficients, (h) Feature extracted from low-pass wavelet coefficients, (i) wavelet leader extracted from high-pass wavelet coefficients (LLH, LHH, HLH) contains
Nightvideos , Thermal , and intermittentObjectMotion . 2.
For evaluating the proposed method for burst detection, Wallflower dataset [38] is used which contains burst motion video sequences. For this purpose, we use two different scenarios of this dataset - Camouflage, and Waving Trees. 3.
As mentioned before, dynamic texture detection can be considered as motion detection problem. In order to examine the proposed method for dynamic texture detection, Dyntex dataset [39] is used. Dyntex is a comprehensive database of dynamic textures providing a large and diverse database of high-quality dynamic textures, which have been de-interlaced with a spatiotemporal median filter. The dynamic texture sequences have been acquired using a SONY 3 CCD Camera and a tripod. The sequences are recorded in PAL format (720 × 576). 4.
UCSD pedestrian dataset [33] is another dataset which is used for motion detection evaluation. The dataset contains video of pedestrians on UCSD walkways, taken from a stationary camera with two different viewpoints. In this paper, we compare the results of the proposed method to the results of MDT [33] (available at http://visal.cs.cityu.edu.hk/ ). A. Qualitative Comparison
In this section, the qualitative comparison of our proposed method with various methods is considered. As mentioned efore, our proposed method is compared to various approaches including CP3-online [9], IUTIS-3 [10], SUBSENSE [12], AAPSA [16], and MDT [33]. From the viewpoint of the occlusion, Figure 6 illustrates the comparison of the moving object segmentation between the proposed method and the previous approaches, including CP3-online [9], IUTIS-3 [10], and SUBSENSE [12], for two frames of the winterStreet sequence. In the figure, the red segments are the gray mask extracted from the ground truth, the blue segments illustrate the moving regions, the green segments represent the static regions, and the yellow circles highlight the differences of the segmentation results. As can be seen in the results, our proposed method overcomes the occlusion problem. Figure 7 illustrates a more qualitative comparison of our proposed method with the aforementioned approaches, for various frames of winterStreet sequence of the CD.net 2014. As shown in the results, despite severe environmental conditions raised by video acquisition and car light at night, our proposed method can deal with the occlusion properly, unlike the other methods. Another qualitative comparison, for various frames of streetCornerAtNight sequence of the CD.net 2014 dataset, is indicated in Figure 8. The results of our proposed method are compared with CP3-online [9] and SUBSENSE [12]. As shown in the results, our proposed method can overcome the motion detection problem at night light properly. F r a m e G r ound t r u t h C P - on li n e [ ] I U T I S - [ ] S U B S E N S E [ ] P r opo s e d m e t hod F r a m e G r ound t r u t h C P - on li n e [ ] I U T I S - [ ] S U B S E N S E [ ] P r opo s e d m e t hod Figure 6: Comparison of the occlusion of the moving object detection for the proposed method and the previous approaches frame Ground truth CP3-online [9] IUTIS-3 [10] SUBSENSE [12] proposed method
Fig.7. Qualitative comparison of the segmentation results with different approaches for various frames of ‘ winterStreet’ sequence of the CD.net 2014 dataset
Moreover, Figure 9 illustrates another qualitative comparison of our proposed method with CP3-online [9], SUBSENSE [12], and AAPSA [16] for various frames of the busyBoulvard sequence of the CD.net 2014 dataset. As these results indicate, our proposed method can overcome the illumination variations, occlusion more robustly. Figure 10 illustrates a qualitative comparison of our proposed method with CP3-online [9] and AAPSA [16] for various frames of some
Thermal sequence of the CD.net 2014 dataset. The results indicate the appropriate ability of our proposed method in difficulties raised by camouflage.
Figure 11 illustrates the motion detection results of CP3-online [9], SUBSENSE [12], and our proposed method respectively for various frames of the ‘ blizzard’, ‘streatlight’, and ‘parking’ sequences in the CD.net 2014 dataset. As the results indicate, the proposed method can segment far and tiny moving objects perfectly.
In Figures 7, 8, 9, and 11, the gray segments in the last column indicate the mask which is extracted from the ground truths and multiplied to the results of the proposed method. This mask is applied for the results of the other methods by multiplying zero for this region to the segmentation results.
Fig.8. Qualitative comparison of the segmentation results with different approaches for various frames of ‘ streetCornerAtNight’ sequence of the CD.net 2014. frame Ground truth CP3_online [9] SUBSENSE [12] AAPSA [16] Proposed method
Frame Ground truth CP3_online [9] SuBSENSE [12] Proposed method
Fig.9. Qualitative comparison of the segmentation results with different approaches for various frames of busyBoulvard sequence of the CD.net 2014 dataset
More comparisons between the results of the proposed method and the previous approaches, without multiplying the masks of the ground truths, are available at http://dspl.ce.sharif.edu/motiondetector.html. Furthermore, Figure 12 indicates a qualitative comparison of the motion detection results with CP3-online [9], for various frames of busStation sequence of the CD.net 2014 dataset. Figure 13 indicates another qualitative comparison of the motion detection results with MDT [33] for some frames of the sequences of the UCSD pedestrian dataset. In this figure, the green segments represent motions and the yellow segments represent none-motion segments. Results indicate the noticeable efficiency improvement of our proposed method.
Original image Ground truth AAPSA [16] CP3_online [9] Proposed method C o rr i do r C o rr i do r C o rr i do r C o rr i do r P a r k P a r k P a r k a r k Fig.10. Qualitative comparison of the segmentation results with different approaches for various frames of some ‘
Thermal’ sequences of the CD.net 2014 dataset
Frame Ground truth CP3_online [9] SUBSENSE [12] AAPSA [16] Proposed method b li zza r d s t r ee tli gh t s t r ee tli gh t s t r ee tli gh t p a r k i ng Fig.11. Qualitative comparison of the segmentation results with different approaches for some frames of intermittentObjectMotion of the sequences of the CD.net 2014 dataset Frame Ground truth CP3_online [9] Proposed method
Figure 12: Qualitative comparison of the motion detection results with different approaches for various frames of busStation sequence of the CD.net 2014 dataset vidd1_33_000.y/ F r a m e M D T [ ] P r opo s e d m e t hod vidd1_33_016.y/ F r a m e M D T [ ] P r opo s e d m e t hod vidf1_33_000.y/ F r a m e M D T [ ] P r opo s e d m e t hod Figure 13: Qualitative comparison of the motion detection results with MDT [33] for some frames of the sequences of the UCSD pedestrian dataset, (static segments colored yellow and motion segments colored green)
Figure 14 illustrates a qualitative comparison of the motion detection results of the proposed method with CP3-online [9] for burst motion. This figure contains a cathode ray tube and a waving tree displayer respectively. As the results show, our proposed method can overcome the problem of burst motion detection properly. Finally, Figure 15 illustrates the motion detection results of our proposed method on various frames of sequences in Dyntex dataset. In this figure, the segmentation results indicate the static and dynamic regions by two different colors. The sequences contain dynamic textures like smoke, fire, and waving water. And burst motions contain disk driver and washing machine. The results indicate that our proposed method can be used for dynamic texture segmentation in video sequences. B. Quantitative Mesurments
For quantitative comparison, various evaluation metrics contain
Recall ( Re ); Specificity ( Sp ); False Positive Rate ( FPR ); False Negative Rate ( FNR ); Percentage of Wrong Classifications ( PWC ); F-measure , and
Precision will be used. Equations (6)-(12) demonstrate the quantitative measures in terms of
True Positive ( TP ), True Negative ( TN ), False Positive ( FP ), and False Negative ( FN ). Recall which is defined as: ,Re
FNTPTP (8) can be seen as the completeness of the moving object. Specificity can be seen as the completeness of background and is defined as:
FPTNTNSp (9) False Positive Rate is the rate of the background which is detected as the foreground incorrectly, and
False Negative Rate - the rate of the foreground which is detected as the background incorrectly, which are defined as the equations (10) and (11) respectively.
TNFPFPFPR (10) FNTPFNFNR (11) Percentage of Wrong Classifications is the percentage of the foreground and background which is detected incorrectly and is defined as: .*100
TNFPFNTP FPFNPWC (12) Finally,
F-measure is a weighted harmonic mean of the
Precision and
Recall and is defined as:
RePr RePr.2 ecisionecisionmeasureF , (13) in which
Precision is defined as: .Pr
FPTPTPecision (14) For the aforementioned measures, zero is the best value for FPR , FNR , and
PWC , while one is the most value for
Recall , Specificity , and
F-measure . Frame CP3-online [9] Proposed method
Fig.14. Qualitative comparison of the segmentation results of the proposed method with CP3-online [9] for two frames of the
Wallflower dataset ea a m e ac ca Fig.15. Qualitative segmentation results of the proposed method for some frames of the video sequences in Dyntex dataset . Quantitative Comparison
As mentioned before, the most important parameter of the model is the cubic patch size for computing the wavelet coefficients. In this section, we evaluate the proposed algorithms for different cubic patch sizes which illustrated in table 1. In this table the scale decomposition levels are shown. In this section these different patch sizes are used for examining the proposed method. Table 2 illustrates the average values of the different measures containing Re , SP , FPR , FNR , PWC , Precision , F-measure and the average of computational time in seconds taken by the proposed approach for the volumes with different scales and sizes, for eighteen video sequences of three different categories of CD.net 2014 dataset. The video sequences are highway (with frames size: 320×240), parking (320×240), streetlight (320×240), abandonedBox (432×288), winterDriveway (320×240), tramstop (432×288), sofa (320×240), park (352×288), lakeSide (320×240), corridor (320×240), diningRoom (320×240), library (320×240), bridgeEntry (630×430), busyBoulvard (640×364), fluidHighway (700×450), streetCornerAtNight (595×245), tramStation (480×295), winterStreet (624×420) belong to different categories of CD.net 2014 dataset contain
Nightvideos , Thermal , and intermittentObjectMotion and
Baseline . The approach is implemented in MATLAB 8.3 running on an Intel 3.30 Giga-Hz CPU system with 4.00 Giga-Bytes memories. As mentioned before, zero is the best value for
FPR , FNR , and
PWC measures and one is the most value for
Recall , Specificity , and
F-measure . Results indicate that the patch size 4×4×4 produces to the best results. Also, the computational time for the best patch size is 304.80 seconds. Table 3 illustrates a quantitative comparison with various approaches containing CP3-online [9], IUTIS-3 [10], and
Table 1. The sizes of the cubic patches and their decomposition scales which are used for experimental results Patch sizes Scale Decomposition levels 2×2×2 1 1×1×1 2×2×4 1 1×1×2 2×2×6 1 1×1×3 2×2×8 1 1×1×4 4×4×2 1 2×2×1 4×4×4 2 2×2×2→1×1×1 4×4×6 2 2×2×3→1×1×1 4×4×8 2 2×2×4→1×1×2 8×8×2 1 4×4×1 8×8×4 2 4×4×2→2×2×1 8×8×6 2 4×4×3→2×2×1 8×8×8 3 4×4×4→2×2×2→1×1×1
SUBSENSE [12], AAPSA [16], C-EFIC [34], GMM|Zivkovic [35], CwisarDH [4], SOBS_CF [2], and AMBER [36] for frames of the mentioned eighteen different video sequences of CD.net 2014 dataset respectively. In this experiment, the cubic patch sizes are (4×4×4) and the decomposition scale is 2. As the results show, the average values of the measures, containing Re , Sp , FNR , PWC , Precision and
F-measure of our proposed method, for four different mentioned categories, are equal to 0.8209, 0.9392, 0.0608, 0.1719, 4.1139, 0.7884 and 0.7783 respectively. According to the value of
Precision measure, these quantitative results indicate an impressive improvement compared with the previous approaches. Figure 16-(a) illustrates a quantitative comparison of different measurements between the proposed method and nine different unsupervised approaches for video sequences of three categories of CD.net 2014 dataset contain intermittentObjectMotion , Thermal and
NightVideos . In this figure the measurements are RE , SP , Precision and
F-Measure
Table 2. The average of the different measures and computational time of different scales and patch sizes for eighteen video sequences belong to four categories of CD.net 2014 dataset Patch Size Re SP FPR FNR PWC Precision F_measure Computational Time (s) ,82290,78320,8161 C a t Fig.16. Comparison of proposed method with nine different approaches for all the video sequences of three categories contain intermittentObjectMotion, Thermal and
NightVideos
CwisarDH
AMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSE
C-EFIC
CwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBER
CP3-Online
SuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDH
AMBER
CP3-OnlineSuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBER C a t C a t C a t C a t C a t C a t C a t C a t C a t C a t C a t C a t R E S PP r e c i s i o n F - M ea s u r e (b) (c) Cat1: intermittentObjectMotion
Cat2:
Thermal
Cat3:
NightVideos (a)
C-EFIC
CwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBER
CP3-Online
SuBSENSEC-EFICCwisarDHAMBERCP3-OnlineSuBSENSEC-EFICCwisarDHAMBER A v e r a g e R E A v e r a g e S P A v e r a g e P r e c i s i o n A v e r a g e F - M e a s u r e nd the approaches are CP3-online [9], IUTIS-3 [10], and SUBSENSE [12], AAPSA [16], C-EFIC [34], GMM|Zivkovic [35], CwisarDH [4], SOBS_CF [2], and AMBER [36] and 3D-DWT_MD. In this figure, the horizontal bars in red demonstrate the measurement value for the proposed method. Figure 16-(b) demonstrates the order of approaches in Figure 16-(a), for measuring RE and the second category ( Thermal ). This order is repeated for other parts of this figure intermittently. Figure 16-(c) illustrates the average of the mentioned measure for the three mentioned categories in figure 16-(a) with the same order of approaches. Comparing the measures RE , Precision and
F-Measure in this figure indicates that the proposed method excels the other approaches significantly. Also, more details about the measure values can be find at http://dspl.ce.sharif.edu/motiondetector.html. VI. C ONCLUSIONS
In this paper, we proposed a novel motion detection method using spatial frequency descriptors based on the three-dimensional wavelet transform and three-dimensional wavelet leader. Thanks to the ability of the frequency domain approaches in considering the holistic motion pattern information, the proposed method can effectively deal with the difficulties raised by illumination changes, camouflage, and burst physical motions. Moreover, for this valuable property of wavelet-based descriptors, the proposed method could be used for dynamic texture segmentation. Also, the proposed method had a good capability in detecting far and tiny moving objects. In order to evaluate the performance of the proposed method, various qualitative and quantitative comparisons were done. Towards this goal, four different datasets containing CD.net 2014, Wallflower, Dyntex, and UCSD pedestrian were used. Furthermore, various evaluation metrics containing
Recall ( Re ), Specificity (Sp),
False Positive Rate ( FPR ), False Negative Rate ( FNR ), Percentage of Wrong Classifications ( PWC ), Precision , and
F-measure for different video sequences, were computed. The results from these qualitative and quantitative comparisons demonstrated that our proposed approach outperforms its competitors, both in terms of motion detection and the capability of segmenting the dynamic textures properly. R
EFERENCES
1. Takaya, K. Detection of scene changes for video indexing by means of the MPEG motion vectors. in 2006 International Symposium on Intelligent Signal Processing and Communications. 2006. IEEE. 2. Maddalena, L. and A. Petrosino, A fuzzy spatial coherence-based approach to background/foreground separation for moving object detection. Neural Computing and Applications, 2010. (2): p. 179-186. 3. Liang, D., et al. Co-occurrence-based adaptive background model for robust object detection. in Advanced Video and Signal Based Surveillance (AVSS), 2013 10th IEEE International Conference on. 2013. IEEE. 4. De Gregorio, M. and M. Giordano. Change detection with weightless neural networks. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. 5. Lu, X. A multiscale spatio-temporal background model for motion detection. in 2014 IEEE International Conference on Image Processing (ICIP). 2014. IEEE. 6. Sedky, M., M. Moniri, and C.C. Chibelushi. Spectral-360: A physics-based technique for change detection. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. 7. Wang, R., et al. Static and moving object detection using flux tensor with split gaussian models. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. 8. Miron, A. and A. Badii. Change detection based on graph cuts. in 2015 International Conference on Systems, Signals and Image Processing (IWSSIP). 2015. IEEE. 9. Liang, D., et al., Co-occurrence probability-based pixel pairs background model for robust object detection in dynamic scenes. Pattern Recognition, 2015. (4): p. 1374-1390. 10. Bianco, S., G. Ciocca, and R. Schettini, How far can you get by combining change detection algorithms? arXiv preprint arXiv:1505.02921, 2015. 11. St-Charles, P.-L., G.-A. Bilodeau, and R. Bergevin. A self-adjusting approach to change detection based on background word consensus. in 2015 IEEE Winter Conference on Applications of Computer Vision. 2015. IEEE. 12. St-Charles, P.-L., G.-A. Bilodeau, and R. Bergevin, Subsense: A universal change detection method with local adaptive sensitivity. IEEE Transactions on Image Processing, 2015. (1): p. 359-373. 13. Ji, H., et al., Wavelet domain multifractal analysis for static and dynamic texture classification. IEEE Transactions on Image Processing, 2013. (1): p. 286-299. 14. Huang, J.-C. and W.-S. Hsieh, Wavelet-based moving object segmentation. Electronics Letters, 2003. (19): p. 1. 15. Töreyin, B.U., et al., Moving object detection in wavelet compressed video. Signal Processing: Image Communication, 2005. (3): p. 255-264. 16. Ramírez-Alonso, G. and M.I. Chacón-Murguía, Auto-Adaptive Parallel SOM Architecture with a modular analysis for dynamic object segmentation in videos. Neurocomputing, 2016. : p. 990-1000. 17. St-Charles, P.-L., G.-A. Bilodeau, and R. Bergevin, Universal background subtraction using word consensus models. IEEE Transactions on Image Processing, 2016. (10): p. 4768-4781. 18. St-Charles, P.-L., G.-A. Bilodeau, and R. Bergevin, Universal Background Subtraction Using Word Consensus Models. 19. Jaffard, S., Wavelet techniques in multifractal analysis. 2004, DTIC Document. 20. Juhola, J., et al. Scan rate conversions using weighted median filtering. in Circuits and Systems, 1989., IEEE International Symposium on. 1989. IEEE. 21. Haavisto, P., J. Juhola, and Y. Neuvo, Scan rate up-conversion using adaptive weighted median filtering. 1990, Elsevier Sciences Publishers, Amsterdam. p. 703-710. 22. Xu, J., et al., Memory-constrained 3D wavelet transform for video coding without boundary effects. IEEE Transactions on Circuits and Systems for Video Technology, 2002. (9): p. 812-818. 23. Zhang, H., J.E. Fritts, and S.A. Goldman. A fast texture feature extraction method for region-based image segmentation. in Electronic Imaging 2005. 2005. International Society for Optics and Photonics. 24. Gubbi, J., S. Marusic, and M. Palaniswami, Smoke detection in video using wavelets and support vector machines. Fire Safety Journal, 2009. (8): p. 1110-1115. 25. Töreyin, B.U., et al., Computer vision based method for real-time fire and flame detection. Pattern recognition letters, 2006. (1): p. 49-58. 26. Demonceaux, C. and D. Kachi-Akkouche, Motion detection using wavelet analysis and hierarchical Markov models, in Spatial Coherence for Visual Motion Analysis. 2006, Springer. p. 64-75. 27. Abdelnour, A.F. and I.W. Selesnick. Nearly symmetric orthogonal wavelet bases. in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP). 2001. 8. Lashermes, B., S. Jaffard, and P. Abry. Wavelet leader based multifractal analysis. in Proceedings.(ICASSP'05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. 2005. IEEE. 29. Wendt, H., et al. Wavelet leader multifractal analysis for texture classification. in 2009 16th IEEE International Conference on Image Processing (ICIP). 2009. IEEE. 30. Chen, X., et al., New image quality assessment method using wavelet leader pyramids. Optical Engineering, 2011. (6): p. 067011-067011-8. 31. Pustelnik, N., H. Wendt, and P. Abry. Local regularity for texture segmentation: Combining wavelet leaders and proximal minimization. in IEEE International Conference on Acoustics, Speech, and Signal Processing-ICASSP 2013. 2013. 32. Pustelnik, N., et al., Local regularity, wavelet leaders and total variation based procedures for texture segmentation. arXiv preprint arXiv:1504.05776, 2015. 33. Chan, A.B. and N. Vasconcelos, Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008. (5): p. 909-926. 34. Allebosch, G., et al. C-EFIC: Color and Edge Based Foreground Background Segmentation with Interior Classification. in International Joint Conference on Computer Vision, Imaging and Computer Graphics. 2015. Springer. 35. Zivkovic, Z. Improved adaptive Gaussian mixture model for background subtraction. in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. 2004. IEEE. 36. Wang, B. and P. Dudek. A fast self-tuning background subtraction algorithm. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. 37. Goyette, N., et al. Changedetection. net: A new change detection benchmark dataset. in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2012. IEEE. 38. Toyama, K., et al. Wallflower: Principles and practice of background maintenance. in Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. 1999. IEEE. 39. Péteri, R., S. Fazekas, and M.J. Huiskes, DynTex: A comprehensive database of dynamic textures. Pattern Recognition Letters, 2010.31