Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew Hines is active.

Publication


Featured researches published by Andrew Hines.


Speech Communication | 2012

Speech intelligibility prediction using a Neurogram Similarity Index Measure

Andrew Hines; Naomi Harte

Discharge patterns produced by fibres from normal and impaired auditory nerves in response to speech and other complex sounds can be discriminated subjectively through visual inspection. Similarly, responses from auditory nerves where speech is presented at diminishing sound levels progressively deteriorate from those at normal listening levels. This paper presents a Neurogram Similarity Index Measure (NSIM) that automates this inspection process, and translates the response pattern differences into a bounded discrimination metric. Performance intensity functions can be used to provide additional information over measurement of speech reception threshold and maximum phoneme recognition by plotting a test subjects recognition probability over a range of sound intensities. A computational model of the auditory periphery was used to replace the human subject and develop a methodology that simulates a real listener test. The newly developed NSIM is used to evaluate the model outputs in response to Consonant-Vowel-Consonant (CVC) word lists and produce phoneme discrimination scores. The simulated results are rigorously compared to those from normal hearing subjects in both quiet and noise conditions. The accuracy of the tests and the minimum number of word lists necessary for repeatable results is established and the results are compared to predictions using the speech intelligibility index (SII). The experiments demonstrate that the proposed simulated performance intensity function (SPIF) produces results with confidence intervals within the human error bounds expected with real listener tests. This work represents an important step in validating the use of auditory nerve models to predict speech intelligibility.


Speech Communication | 2010

Speech intelligibility from image processing

Andrew Hines; Naomi Harte

Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. The responses of the model used in this study have been previously shown to be consistent with a wide range of physiological data from both normal and impaired ears for stimuli presentation levels spanning the dynamic range of hearing. The model output can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated in this study using two types of neurograms: temporal fine structure (TFS) and average discharge rate or temporal envelope. A new systematic way of assessing phonemic degradation is proposed using the outputs of an auditory nerve model for a range of SNHLs. The mean structured similarity index (MSSIM) is an objective measure originally developed to assess perceptual image quality. The measure is adapted here for use in measuring the phonemic degradation in neurograms derived from impaired auditory nerve outputs. A full evaluation of the choice of parameters for the metric is presented using a large amount of natural human speech. The metrics boundedness and the results for TFS neurograms indicate it is a superior metric to standard point to point metrics of relative mean absolute error and relative mean squared error. MSSIM as an indicative score of intelligibility is also promising, with results similar to those of the standard speech intelligibility index metric.


international conference on acoustics, speech, and signal processing | 2013

Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA

Andrew Hines; Jan Skoglund; Anil C. Kokaram; Naomi Harte

The Virtual Speech Quality Objective Listener (ViSQOL) is a new objective speech quality model. It is a signal based full reference metric that uses a spectro-temporal measure of similarity between a reference and a test speech signal. ViSQOL aims to predict the overall quality of experience for the end listener whether the cause of speech quality degradation is due to ambient noise, or transmission channel degradations. This paper describes the algorithm and tests the model using two speech corpora: NOIZEUS and E4. The NOIZEUS corpus contains speech under a variety of background noise types, speech enhancement methods, and SNR levels. The E4 corpus contains voice over IP degradations including packet loss, jitter and clock drift. The results are compared with the ITU-T objective models for speech quality: PESQ and POLQA. The behaviour of the metrics are also evaluated under simulated time warp conditions. The results show that for both datasets ViSQOL performed comparably with PESQ. POLQA was shown to have lower correlation with subjective scores than the other metrics for the NOIZEUS database.


Journal of the Acoustical Society of America | 2015

ViSQOLAudio: An objective audio quality metric for low bitrate codecs.

Andrew Hines; Eoin Gillen; Damien Kelly; Jan Skoglund; Anil C. Kokaram; Naomi Harte

Streaming services seek to optimise their use of bandwidth across audio and visual channels to maximise the quality of experience for users. This letter evaluates whether objective quality metrics can predict the audio quality for music encoded at low bitrates by comparing objective predictions with results from listener tests. Three objective metrics were benchmarked: PEAQ, POLQA, and VISQOLAudio. The results demonstrate objective metrics designed for speech quality assessment have a strong potential for quality assessment of low bitrate audio codecs.


acm multimedia | 2014

Perceived Audio Quality for Streaming Stereo Music

Andrew Hines; Eoin Gillen; Damien Kelly; Jan Skoglund; Anil C. Kokaram; Naomi Harte

Users of audio-visual streaming services expect an ever increasing quality of experience. Channel bandwidth remains a bottleneck commonly addressed with lossy compression schemes for both the video and audio streams. Anecdotal evidence suggests a strongly perceived link between bit rate and quality. This paper presents three audio quality listening experiments using the ITU MUSHRA methodology to assess a number of audio codecs typically used by streaming services. They were assessed for a range of bit rates using three presentation modes: consumer and studio quality headphones and loudspeakers. Our results indicate that with consumer quality headphones, listeners were not differentiating between codecs with bit rates greater than 48 kb/s (p>=0.228). For studio quality headphones and loudspeakers aac-lc at 128 kb/s and higher was differentiated over other codecs (p<=0.001). The results provide insights into quality of experience that will guide future development of objective audio quality metrics.


quality of multimedia experience | 2013

Detailed comparative analysis of PESQ and VISQOL behaviour in the context of playout delay adjustments introduced by VOIP jitter buffer algorithms

Andrew Hines; Peter Pocta; Hugh Melvin

This paper undertakes a detailed comparative analysis of both PESQ and VISQOL model behaviour, when tested against speech samples modified through playout delay adjustments. The adjustments are typical (in extent and magnitude) to those introduced by VoIP jitter buffer algorithms. Furthermore, the analysis examines the impact of adjustment location as well as speaker factors on MOS scores predicted by both models and seeks to determine if both models are able to correctly predict the impact on quality perceived by the end user from earlier subjective tests. The earlier results showed speaker voice preference and potentially wideband experience dominating subjective tests more than playout delay adjustment duration or location. By design, PESQ and VISQOL do not qualify speaker voice difference reducing their correlation with the subjective tests. In addition, it was found that PESQ scores are impacted by playout delay adjustments and thus the impact of playout delay adjustments on a quality perceived by the end user is not well modelled. On the other hand, VISQOL model is better in predicting an impact of playout delay adjustments on a quality perceived by the user but there are still some discrepancies in the predicted scores. The reasons for those discrepancies are particularly analysed and discussed.


Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge | 2014

Building a Database of Political Speech: Does Culture Matter in Charisma Annotations?

Ailbhe Cullen; Andrew Hines; Naomi Harte

For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.


quality of multimedia experience | 2015

TCD-VoIP, a research database of degraded speech for assessing quality in VoIP applications

Naomi Harte; Eoin Gillen; Andrew Hines

There are many types of degradation which can occur in Voice over IP calls. Degradations which occur independently of the codec, hardware, or network in use are the focus of this paper. The development of new quality metrics for modern communication systems depends heavily on the availability of suitable test and development data with subjective quality scores. A new dataset of VoIP degradations (TCD-VoIP) has been created and is presented in this paper. The dataset contains speech samples with a range of common VoIP degradations, and the corresponding set of subjective opinion scores from 24 listeners. The dataset is publicly available.


quality of multimedia experience | 2017

A framework for post-stroke quality of life prediction using structured prediction

Andrew Hines; John D. Kelleher

This paper presents a conceptual model that relates Quality of Life to the established Quality of Experience formation process. It uses concepts developed by the Quality of Experience community to propose an adapted framework for developing predictive models for Quality of Life. A mapping of common factors that can be applied to health related quality of life is proposed and practical challenges for modelling and applications are presented and discussed. The process of identifying and categorising factors and features is illustrated using stroke patient treatment as an example use case.


IEEE Transactions on Broadcasting | 2017

Objective Assessment of Perceptual Audio Quality Using ViSQOLAudio

Colm Sloan; Naomi Harte; Damien Kelly; Anil C. Kokaram; Andrew Hines

Digital audio broadcasting services transmit substantial amounts of data that is encoded to minimize bandwidth whilst maximizing user quality of experience. Many large service providers continually alter codecs to improve the encoding process. Performing subjective tests to validate each codec alteration would be impractical, necessitating the use of objective perceptual audio quality models. This paper evaluates the quality scores from ViSQOLAudio, an objective perceptual audio quality model, against the quality scores of PEAQ, POLQA, and PEMO-Q on three datasets containing fullband audio encoded with a variety of codecs and bitrates. The results show that ViSQOLAudio was more accurate than all other models on two of the datasets and performed well on the third, demonstrating the utility of ViSQOLAudio for predicting the perceptual audio quality for encoded music.

Collaboration


Dive into the Andrew Hines's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hugh Melvin

National University of Ireland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adriaan Barri

Vrije Universiteit Brussel

View shared research outputs
Top Co-Authors

Avatar

Manish Narwaria

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Judith Redi

Delft University of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge