Jianjun He
Nanyang Technological University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jianjun He.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Jianjun He; Ee-Leng Tan; Woon-Seng Gan
Audio signals for moving pictures and video games are often linear combinations of primary and ambient components. In spatial audio analysis-synthesis, these mixed signals are usually decomposed into primary and ambient components to facilitate flexible spatial rendering and enhancement. Existing approaches such as principal component analysis (PCA) and least squares (LS) are widely used to perform this decomposition from stereo signals. However, the performance of these approaches in primary-ambient extraction (PAE) has not been well studied and no comparative analysis among the existing approaches has been carried out so far. In this paper, we generalize the existing approaches into a linear estimation framework. Under this framework, we propose a series of performance measures to identify the components that contribute to the extraction error. Based on the generalized linear estimation framework and our proposed performance measures, a comparative study and experimental testing of the linear estimation based PAE approaches including existing PCA, LS, and three proposed variant LS approaches are presented.
international conference on acoustics, speech, and signal processing | 2014
Jianjun He; Woon-Seng Gan; Ee-Leng Tan
Primary-ambient extraction (PAE) has been playing an important role in spatial audio analysis-synthesis. Based on the spatial features, PAE decomposes a signal into primary and ambient components, which are then rendered separately. PAE is performed in subband domain for complex input signals having multiple point-like sound sources. However, the performance of PAE approaches and their key influences for such signals have not been well-studied so far. In this paper, we conducted a study on frequency-domain PAE using principal component analysis (PCA) in the case of multiple sources. We found that the partitioning of the frequency bins is very critical in PAE. Simulation results reveal that the proposed top-down adaptive partitioning method achieves superior performance as compared to the conventional partitioning methods.
international conference on acoustics, speech, and signal processing | 2013
Jianjun He; Ee-Leng Tan; Woon-Seng Gan
In spatial audio analysis-synthesis, one of the key issues is to decompose a signal into cue and ambient components based on their spatial features. Principal component analysis (PCA) has been widely employed in cue extraction. However, the performance of PCA based cue extraction is highly dependent on the assumptions of the input signal model. One of these assumptions is the input signal contains highly correlated cue at zero lag. However, this assumption is often unmet. To overcome this problem, time shifted PCA is proposed in this paper, which involves time-shifting the input signal according to the estimated inter-channel time difference (ITD) of the input signal before cue extraction. From our simulation and listening tests results, the proposed method is found to be superior to the conventional PCA based cue extraction method.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Jianjun He; Woon-Seng Gan; Ee-Leng Tan
The diversity of todays playback systems requires a flexible, efficient, and immersive reproduction of sound scenes in digital media. Spatial audio reproduction based on primary-ambient extraction (PAE) fulfills this objective, where accurate extraction of primary and ambient components from sound mixtures in channel-based audio is crucial. Severe extraction error was found in existing PAE approaches when dealing with sound mixtures that contain a relatively strong ambient component, a commonly encountered case in the sound scenes of digital media. In this paper, we propose a novel ambient spectrum estimation (ASE) framework to improve the performance of PAE. The ASE framework exploits the equal magnitude of the uncorrelated ambient components in two channels of a stereo signal, and reformulates the PAE problem into the problem of estimating either ambient phase or magnitude. In particular, we take advantage of the sparse characteristic of the primary components to derive sparse solutions for ASE based PAE, together with an approximate solution that can significantly reduce the computational cost. Our objective and subjective experimental results demonstrate that the proposed ASE approaches significantly outperform existing approaches, especially when the ambient component is relatively strong.
IEEE Signal Processing Letters | 2015
Jianjun He; Woon-Seng Gan; Ee-Leng Tan
Spatial audio reproduction addresses the growing commercial need to recreate an immersive listening experience of digital media content, such as movies and games. Primary-ambient extraction (PAE) is one of the key approaches to facilitate flexible and optimal rendering in spatial audio reproduction. Existing approaches, such as principal component analysis and time-frequency masking, often suffer from severe extraction error. This problem is more evident when the sound scene contains a relatively strong ambient component, which is frequently encountered in digital media. In this Letter, we propose a novel PAE approach by estimating the ambient phase with a sparsity constraint (APES). This approach exploits the equal magnitude of the uncorrelated ambient components in the two channels of a stereo signal and reformulates the PAE problem as an ambient phase estimation problem, which is then solved using the criterion that the primary component is sparse. Our experimental results demonstrate that the proposed approach significantly outperforms existing approaches, especially when the ambient component is relatively strong.
international conference on acoustics, speech, and signal processing | 2016
Jianjun He; Rishabh Ranjan; Woon-Seng Gan
Head related transfer function (HRTF) is widely used in 3D audio reproduction, especially over headphones. Conventionally, HRTF database is acquired at discrete directions and the acquisition process is time-consuming. Recent works have been proposed to improve HRTF acquisition efficiency via continuous acquisition. However, these HRTF acquisition techniques still require subject to sit still (with limited head movement) in a rotating chair. In this paper, we further relax the head movement constraint during acquisition by using a head tracker. The proposed continuous HRTF acquisition technique relies on the activation based normalized least-mean-square (ANLMS) algorithm to extract HRTF on the fly. Experimental results validated the accuracy of the proposed technique, when compared with the standard static acquisition technique.
international conference on acoustics, speech, and signal processing | 2015
Jianjun He; Woon-Seng Gan; Ee-Leng Tan
Individualization of head-related transfer functions (HRTFs) can be realized using the persons anthropometry with a pretrained model. This model usually establishes a direct linear or non-linear mapping from anthropometry to HRTFs in the training database. Due to the complex relation between anthropometry and HRTFs, the accuracy of this model depends heavily on the correct selection of the anthropometric features. To alleviate this problem and improve the accuracy of HRTF individualization, an indirect HRTF individualization framework was proposed recently, where HRTFs are synthesized using a sparse representation trained from the anthropometric features. In this paper, we extend their study on this framework by investigating the effects of different preprocessing and postprocessing methods on HRTF individualization. Our experimental results showed that preprocessing and postprocessing methods are crucial for achieving accurate HRTF individualization.
international conference on acoustics, speech, and signal processing | 2015
Jianjun He; Woon-Seng Gan
In spatial audio analysis-synthesis, one of the key issues is to decompose a signal into primary and ambient components based on their spatial features. Principal component analysis (PCA) has been widely employed in primary component extraction, and shifted PCA (SPCA) is employed to enhance the primary extraction for input signals involving the inter-channel time difference. However, SPCA generally requires the primary components to come from one direction and cannot produce good results in the case of multiple directions. To solve this problem, we propose multi-shift PCA (MSPCA) by extending SPCA to multiple shifts. Two structures of MSPCA with different weighting methods are discussed. From the results of our simulations and listening tests, the proposed consecutive MSPCA with proper weighting is found to be superior to the conventional PCA and SPCA based primary extraction methods.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Jianjun He; Woon-Seng Gan; Ee-Leng Tan
One of the key issues in spatial audio analysis and reproduction is to decompose a signal into primary and ambient components based on their directional and diffuse spatial features, respectively. Existing approaches employed in primary-ambient extraction (PAE), such as principal component analysis (PCA), are mainly based on a basic stereo signal model. The performance of these PAE approaches has not been well studied for the input signals that do not satisfy all the assumptions of the stereo signal model. In practice, one such case commonly encountered is that the primary components of the stereo signal are partially correlated at zero lag, referred to as the primary-complex case. In this paper, we take PCA as a representative of existing PAE approaches and investigate the performance degradation of PAE with respect to the correlation of the primary components in the primary-complex case. A time-shifting technique is proposed in PAE to alleviate the performance degradation due to the low correlation of the primary components in such stereo signals. This technique involves time-shifting the input signal according to the estimated inter-channel time difference of the primary component prior to the signal decomposition using conventional PAE approaches. To avoid the switching artifacts caused by the varied time-shifting in successive time frames, overlapped output mapping is suggested. Based on the results from our experiments, PAE approaches with the proposed time-shifting technique are found to be superior to the conventional PAE approaches in terms of extraction accuracy and spatial accuracy.
asia pacific signal and information processing association annual summit and conference | 2015
Jianjun He; Woon-Seng Gan
Spatial audio reproduction is essential to create a natural listening experience for digital media. Majority of the legacy audio contents are in channel-based format, which is very particular on the desired playback system. Considering the diversity of todays playback systems, the quality of reproduced sound scenes degrades significantly when mismatches between the audio content and the playback system occur. An active sound control approach is required to take the playback system into consideration. Primary ambient extraction (PAE) is an emerging technique that can be employed to solve this pressing issue and achieve an efficient, flexible, and immersive spatial audio reproduction. In this paper, we will review the recent advancements in PAE. A unified framework to extend existing PAE approaches is proposed to improve the robustness of PAE when dealing with complex audio signals in practical situations. Various practical issues on implementing PAE in spatial audio applications are discussed. Objective and subjective evaluations are conducted to validate the feasibility of applying PAE in spatial audio reproduction.