Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Qiuqiang Kong is active.

Publication


Featured researches published by Qiuqiang Kong.


international symposium on neural networks | 2017

Convolutional gated recurrent neural network incorporating spatial features for audio tagging

Yong Xu; Qiuqiang Kong; Qiang Huang; Wenwu Wang; Mark D. Plumbley

Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.


international conference on acoustics, speech, and signal processing | 2017

A joint detection-classification model for audio tagging of weakly labelled data

Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley

Audio tagging aims to assign one or several tags to an audio clip. Most of the datasets are weakly labelled, which means only the tags of the clip are known, without knowing the occurrence time of the tags. The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user. Previous works have used the bag of frames model assume the tags occur all the time, which is not the case in practice. We propose a joint detection-classification (JDC) model to detect and classify the audio clip simultaneously. The JDC model has the ability to attend to informative and ignore uninformative sounds. Then only informative regions are used for classification. Experimental results on the “CHiME Home” dataset show that the JDC model reduces the equal error rate (EER) from 19.0% to 16.9%. More interestingly, the audio event detector is trained successfully without needing the event level label.


european signal processing conference | 2017

Joint detection and classification convolutional neural network on weakly labelled bird audio detection

Qiuqiang Kong; Yong Xu; Mark D. Plumbley

Bird audio detection (BAD) aims to detect whether there is a bird call in an audio recording or not. One difficulty of this task is that the bird sound datasets are weakly labelled, that is only the presence or absence of a bird in a recording is known, without knowing when the birds call. We propose to apply joint detection and classification (JDC) model on the weakly labelled data (WLD) to detect and classify an audio clip at the same time. First, we apply VGG like convolutional neural network (CNN) on mel spectrogram as baseline. Then we propose a JDC-CNN model with VGG as a classifier and CNN as a detector. We report the denoising method including optimally-modified log-spectral amplitude (OM-LSA), median filter and spectral spectrogram will worse the classification accuracy on the contrary to previous work. JDC-CNN can predict the time stamps of the events from weakly labelled data, so is able to do sound event detection from WLD. We obtained area under curve (AUC) of 95.70% on the development data and 81.36% on the unseen evaluation data, which is nearly comparable to the baseline CNN model.


Archive | 2016

Deep Neural Network Baseline for DCASE Challenge 2016

Qiuqiang Kong; Iwona Sobieraj; Wenwu Wang; Mark D. Plumbley


conference of the international speech communication association | 2017

Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging.

Yong Xu; Qiuqiang Kong; Qiang Huang; Wenwu Wang; Mark D. Plumbley


international conference on acoustics, speech, and signal processing | 2018

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network.

Yong Xu; Qiuqiang Kong; Wenwu Wang; Mark D. Plumbley


international conference on acoustics, speech, and signal processing | 2018

Audio set classification with attention model: a probabilistic perspective

Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley


international conference on acoustics, speech, and signal processing | 2018

A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data.

Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley


arXiv: Sound | 2018

Capsule Routing for Sound Event Detection.

Turab Iqbal; Yong Xu; Qiuqiang Kong; Wenwu Wang


arXiv: Sound | 2018

DCASE 2018 Challenge baseline with convolutional neural networks.

Qiuqiang Kong; Turab Iqbal; Yong Xu; Wenwu Wang; Mark D. Plumbley

Collaboration


Dive into the Qiuqiang Kong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yong Xu

University of Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge