Qiuqiang Kong
University of Surrey
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Qiuqiang Kong.
international symposium on neural networks | 2017
Yong Xu; Qiuqiang Kong; Qiang Huang; Wenwu Wang; Mark D. Plumbley
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.
international conference on acoustics, speech, and signal processing | 2017
Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley
Audio tagging aims to assign one or several tags to an audio clip. Most of the datasets are weakly labelled, which means only the tags of the clip are known, without knowing the occurrence time of the tags. The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user. Previous works have used the bag of frames model assume the tags occur all the time, which is not the case in practice. We propose a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously. The JDC model has the ability to attend to informative and ignore uninformative sounds. Then only informative regions are used for classification. Experimental results on the “CHiME Home” dataset show that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%. More interestingly, the audio event detector is trained successfully without needing the event level label.
european signal processing conference | 2017
Qiuqiang Kong; Yong Xu; Mark D. Plumbley
Bird audio detection (BAD) aims to detect whether there is a bird call in an audio recording or not. One difficulty of this task is that the bird sound datasets are weakly labelled, that is only the presence or absence of a bird in a recording is known, without knowing when the birds call. We propose to apply joint detection and classification (JDC) model on the weakly labelled data (WLD) to detect and classify an audio clip at the same time. First, we apply VGG like convolutional neural network (CNN) on mel spectrogram as baseline. Then we propose a JDC-CNN model with VGG as a classifier and CNN as a detector. We report the denoising method including optimally-modified log-spectral amplitude (OM-LSA), median filter and spectral spectrogram will worse the classification accuracy on the contrary to previous work. JDC-CNN can predict the time stamps of the events from weakly labelled data, so is able to do sound event detection from WLD. We obtained area under curve (AUC) of 95.70% on the development data and 81.36% on the unseen evaluation data, which is nearly comparable to the baseline CNN model.
Archive | 2016
Qiuqiang Kong; Iwona Sobieraj; Wenwu Wang; Mark D. Plumbley
conference of the international speech communication association | 2017
Yong Xu; Qiuqiang Kong; Qiang Huang; Wenwu Wang; Mark D. Plumbley
international conference on acoustics, speech, and signal processing | 2018
Yong Xu; Qiuqiang Kong; Wenwu Wang; Mark D. Plumbley
international conference on acoustics, speech, and signal processing | 2018
Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley
international conference on acoustics, speech, and signal processing | 2018
Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley
arXiv: Sound | 2018
Turab Iqbal; Yong Xu; Qiuqiang Kong; Wenwu Wang
arXiv: Sound | 2018
Qiuqiang Kong; Turab Iqbal; Yong Xu; Wenwu Wang; Mark D. Plumbley