Francis F. Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francis F. Li is active.

Explore More

Publication

Featured researches published by Francis F. Li.

Journal of the Acoustical Society of America | 2008

Monaural room acoustic parameters from music and speech

Paul Kendrick; Trevor J. Cox; Francis F. Li; Yonggang Zhang; Jonathon A. Chambers

This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.

Applied Soft Computing | 2007

A neural network model for speech intelligibility quantification

Francis F. Li; Trevor J. Cox

A neural network based model is developed to quantify speech intelligibility by blind-estimating speech transmission index, an objective rating index for speech intelligibility of transmission channels, from transmitted speech signals without resort to knowledge of original speech signals. It consists of a Hilbert transform processor for speech envelope detection, a Welch average periodogram algorithm for envelope spectrum estimation, a principal components analysis (PCA) network for speech feature extraction and a multi-layer back-propagation network for non-linear mapping and case generalisation. The developed model circumvents the use of artificial test signals by exploiting naturally occurring speech signals as probe stimuli, reduces measurement channels from two to one and hence facilitates in situ assessment of speech intelligibility. From a cognitive science viewpoint, the proposed method might be viewed as a successful paradigm of mimicking human perception of speech intelligibility using a hybrid model built around artificial neural networks.

bioinformatics and biomedicine | 2014

Using independent component analysis to obtain feature space for reliable ECG Arrhythmia classification

Mohammad Sarfraz; Ateeq Ahmed Khan; Francis F. Li

Electrocardiogram (ECG) reflects the activities of the human heart and reveals hidden information on its structure and behaviour. The information is extracted to gain insights that assist explanation and identification of diverse pathological conditions. This was traditionally done by an expert through visual inspection of ECGs. The complexity and tediousness of this onus hinder long-term monitoring and rapid diagnosis, computerised and automated ECG signal processing are therefore sought after. In this paper an algorithm that uses independent component analysis (ICA) to improve the performance of ECG pattern recognition is proposed. The algorithm deploys the basis functions obtained via the ICA of typical ECG to extract ICA features of ECG signals for further pattern recognition, with the hypothesis that components of an ECG signal generated by different parts of the heart during normal and arrhythmic cardiac cycles might be independent. The features obtained via the ICA together with the R-R interval and QRS segment power are jointly used as the input to a machine learning classifier, an artificial neural network in this case. Results from training and validation of the MIT-BIH Arrhythmia database shows significantly improved performance in terms of recognition accuracy. This new method also allows for the reduction of the number of inputs to the classifier, simplifying the system and increasing the real-time performance. The paper presents the algorithm, discusses the principle algorithm and presents the validation results.

international conference on acoustics, speech, and signal processing | 2006

Room Acoustic Parameter Extraction from Music Signals

Paul Kendrick; Trevor J. Cox; Yonggang Zhang; Jonathon A. Chambers; Francis F. Li

A new method, employing machine learning techniques and a modified low frequency envelope spectrum estimator, for estimating important room acoustic parameters including reverberation time (RT) and early decay time (EDT) from received music signals has been developed. It overcomes drawbacks found in applying music signals directly to the envelope spectrum detector developed for the estimation of RT from speech signals. The octave band music signal is first separated into sub bands corresponding to notes on the equal temperament scale and the level of each note normalised before applying an envelope spectrum detector. A typical artificial neural network is then trained to map these envelope spectra onto RT or EDT. Significant improvements in estimation accuracy were found and further investigations confirmed that the non-stationary nature of music envelopes is a major technical challenge hindering accurate parameter extraction from music and the proposed method to some extent circumvents the difficulty

international conference on internet monitoring and protection | 2010

Sound-Based Multimodal Person Identification from Signature and Voice

Francis F. Li

Person identification as a security means has a variety of important applications. Many techniques and automated systems have been developed over the past few decades; each has its own advantages and limitations. There are often trade-offs amongst reliability, the ease of use, ethical/human rights issues, and acceptability in a particular application. Multimodal identification and authentication can, to some extent, alleviate the dilemmas and improve the overall performance. This paper proposes a new method of the combined use of signatures and utterances of pronounced names to identify or authenticate persons. Unlike typical signature verification methods, the dynamic features of signatures are captured as sound in this paper. The multimodal approach shows increased reliability, providing a relatively simple and potentially useful method for person identification and authentication.

international conference on pattern recognition | 2004

Handwriting authentication by envelopes of sound signature

Francis F. Li

Frictions between a rigid-nib pen and paper result in audible sounds that are correlated with the dynamics of writing. Such writing-sounds were previously used as a biometric identity to achieve writer authentication. This paper presents an alternative and supplement algorithm for sound-based handwriting authentication. Envelopes of writing-sounds estimated by the Hilbert transforms are found useful in differentiating topologically similar characters written by different individuals. A straightforward supervised neural network in conjunction with a purpose-designed pre-processor can be trained on examples to effectively differentiate patterns of writing-sounds and thus achieve writer authentication, providing a straightforward and potential alternative to existing methods.

international conference on acoustics, speech, and signal processing | 2003

A neural network for blind identification of speech transmission index

Francis F. Li; Trevor J. Cox

A hybrid neural network model is proposed to determine the speech transmission index of a transmission channel from transmitted speech signals without resort to prior knowledge of original speech. It comprises a Hilbert transform pre-processor, a PCA network for speech feature extraction and a multilayer back-propagation network for nonlinear mapping and case generalization. The developed method utilizes naturally occurring speech signals as probe stimuli, reduces measurement channels from two to one and hence facilitates speech transmission channel assessments under in-use conditions.

quality of multimedia experience | 2016

Perception and automated assessment of audio quality in user generated content: An improved model

Bruno Fazenda; Paul Kendrick; Trevor J. Cox; Francis F. Li; Iain Jackson

Technology to record sound, available in personal devices such as smartphones or video recording devices, is now ubiquitous. However, the production quality of the sound on this user-generated content is often very poor: distorted, noisy, with garbled speech or indistinct music. Our interest lies in the causes of the poor recording, especially what happens between the sound source and the electronic signal emerging from the microphone, and finding an automated method to warn the user of such problems. Typical problems, such as distortion, wind noise, microphone handling noise and frequency response, were tested. A perceptual model has been developed from subjective tests on the perceived quality of such errors and data measured from a training dataset composed of various audio files. It is shown that perceived quality is associated with distortion and frequency response, with wind and handling noise being just slightly less important. In addition, the contextual content of the audio sample was found to modulate perceived quality at similar levels to degradations such as wind and rendering those introduced by handling noise negligible.

Journal of the Acoustical Society of America | 2014

Perception and automatic detection of wind-induced microphone noise

Iain Jackson; Paul Kendrick; Trevor J. Cox; Bruno Fazenda; Francis F. Li

Wind can induce noise on microphones, causing problems for users of hearing aids and for those making recordings outdoors. Perceptual tests in the laboratory and via the Internet were carried out to understand what features of wind noise are important to the perceived audio quality of speech recordings. The average A-weighted sound pressure level of the wind noise was found to dominate the perceived degradation of quality, while gustiness was mostly unimportant. Large degradations in quality were observed when the signal to noise ratio was lower than about 15 dB. A model to allow an estimation of wind noise level was developed using an ensemble of decision trees. The model was designed to work with a single microphone in the presence of a variety of foreground sounds. The model outputted four classes of wind noise: none, low, medium, and high. Wind free examples were accurately identified in 79% of cases. For the three classes with noise present, on average 93% of samples were correctly assigned. A second ensemble of decision trees was used to estimate the signal to noise ratio and thereby infer the perceived degradation caused by wind noise.

international conference on multimedia and expo | 2013

Wind-induced microphone noise detection - automatically monitoring the audio quality of field recordings

Paul Kendrick; Trevor J. Cox; Francis F. Li; Bruno Fazenda; Iain Jackson

Wind-induced microphone noise is one of the most common problems leading to poor audio quality in recordings. A wind-noise detector could alert the operator of a recording device to the presence of wind noise so that appropriate action can be taken. This paper presents a single channel algorithm which, within the presence of other sounds, detects and classifies wind noise according to level. A large training database is formed from a wind noise simulator which generates an audio stream based on time histories of real wind velocities. A Support Vector Machine detects and classifies according to wind noise level in 25 ms frames which may contain other sounds. Statistical and temporal data from the detector over a sequence of frames is then used to provide estimates for the average wind noise level. The detector is successfully demonstrated on a number of devices with non-simulated data.

Explore More