Philip Hanna
Queen's University Belfast
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Philip Hanna.
Speech Communication | 2005
Michael F. McTear; Ian M. O'Neill; Philip Hanna; Xingkun Liu
Abstract A number of different approaches have been applied to the treatment of errors in spoken dialogue systems, including careful design to prevent potential errors, methods for on-line error detection, and error recovery when errors have occurred and have been detected. The approach to error handling presented here is premised on the theory of grounding, in which it is assumed that errors cannot be avoided in spoken dialogue and that it is more useful to focus on methods for determining what information needs to be grounded within a dialogue and how this grounding should be achieved. An object-based architecture is presented that incorporates generic confirmation strategies in combination with domain-specific heuristics that together contribute to determining the system’s confirmation strategies when attempting to complete a transaction. The system makes use of a representation of the system’s information state as it conducts a transaction along with discourse pegs that are used to determine whether values have been sufficiently confirmed for a transaction to be concluded. An empirical evaluation of the system is presented along with a discussion of the advantages of the object-based approach for error handling.
IEEE Transactions on Speech and Audio Processing | 2005
James McAuley; Ji Ming; Darryl Stewart; Philip Hanna
This paper investigates the effect of modeling subband correlation for noisy speech recognition. Subband feature streams are assumed to be independent in many subband-based speech recognition systems. However, speech recognition experimental results suggest this assumption is unrealistic. In this paper, a method is proposed to incorporate correlation into subband speech feature streams. In the proposed method, all possible combinations of subbands are created and each combination is treated as a single frequency-band by calculating a single feature vector for it. The resulting feature vectors, therefore, capture information about every band in the combination, as well as the dependency across the bands. Although using the new features results in a higher computational complexity, our experimental results show that they effectively capture the correlation between the subbands while making minimal assumptions about the structure of the correlation. Experiments are conducted on the TIDigits database. The results demonstrate improved accuracy for clean speech recognition and improved robustness in the presence of both stationary and nonstationary band-selective noise, in comparison to a system assuming subband independence.
Science of Computer Programming | 2005
Ian M. O'Neill; Philip Hanna; Xingkun Liu; Des Greer; Michael F. McTear
In this article we describe how Java can be used to implement an object-based, cross-domain, mixed initiative spoken dialogue manager (DM). We describe how dialogue that crosses between several business domains can be modelled as an inheriting and collaborating suite of objects suitable for implementation in Java. We describe the main features of the Java implementation and how the Java dialogue manager can be interfaced via the Galaxy software hub, as used in the DARPA-sponsored Communicator projects in the United States, with the various off-the-shelf components that are needed in a complete end-to-end spoken dialogue system. We describe the interplay of the Java components in the course of typical dialogue turns and present an example of the sort of dialogue that the Java DM can support.
international conference on acoustics speech and signal processing | 1999
Ji Ming; Philip Hanna; Darryl Stewart; Marie Owens; Francis Jack Smith
Most current speech recognition systems are built upon a single type of model, e.g. an HMM or certain type of segment based model, and furthermore typically employs only one type of acoustic feature e.g. MFCCs and their variants. This entails that the system may not be robust should the modeling assumptions be violated. Recent research efforts have investigated the use of multi-scale/multi-band acoustic features for robust speech recognition. This paper described a multi-model approach as an alternative and complement to the multi-feature approaches. The multi-model approach seeks a combination of different types of acoustic models, thereby integrating the capabilities of each individual model for capturing discriminative information. An example system built upon the combination of the standard HMM technique with a segment-based modeling technique was implemented. Experiments for both isolated-word and continuous speech recognition have shown improved performances over each of the individual models considered in isolation.
Artificial Intelligence Review | 2009
Le Quan Ha; Philip Hanna; Ji Ming; Francis Jack Smith
Experiments show that for a large corpus, Zipf’s law does not hold for all ranks of words: the frequencies fall below those predicted by Zipf’s law for ranks greater than about 5,000 word types in the English language and about 30,000 word types in the inflected languages Irish and Latin. It also does not hold for syllables or words in the syllable-based languages, Chinese or Vietnamese. However, when single words are combined together with word n-grams in one list and put in rank order, the frequency of tokens in the combined list extends Zipf’s law with a slope close to −1 on a log-log plot in all five languages. Further experiments have demonstrated the validity of this extension of Zipf’s law to n-grams of letters, phonemes or binary bits in English. It is shown theoretically that probability theory alone can predict this behavior in randomly created n-grams of binary bits.
Speech Communication | 1999
Philip Hanna; Ji Ming; Francis Jack Smith
Abstract This paper extends the notion of capturing temporal information within an HMM framework by permitting the observed frame not only to be dependent upon preceding frames, but also upon succeeding frames. In particular the IFD–HMM (Ming and Smith, 1996) is extended to support any number of preceding and/or succeeding frame dependencies. The means through which such a dependency might be integrated into an HMM framework are explored, and details given of the resultant changes to the IFD–HMM. Experimental results are provided, contrasting the use of bi-directional frame dependencies to the use of preceding only frame dependencies and exploring how such dependencies can be best employed. It was found that a dependency upon succeeding frames enabled dynamic spectral information not found in the preceding frames to be usefully employed, resulting in a significant increase in the recognition accuracy. It was also found that the use of frame dependencies proved to be a more effective means of increasing recognition accuracy than the use of multiple mixtures.
ACM Transactions on Speech and Language Processing | 2007
Philip Hanna; Ian M. O'Neill; Craig Wootton; Michael F. McTear
This article describes how an object-oriented approach can be applied to the architectural design of a spoken language dialog system with the aim of facilitating the modification, extension, and reuse of discourse-related expertise. The architecture of the developed system is described and a functionally similar VoiceXML system is used to provide a comparative baseline across a range of modification and reuse scenarios. It is shown that the use of an object-oriented dialog manager can provide a capable means of reusing existing discourse expertise in a manner that limits the degree of structural decay associated with system change.
international conference on pattern recognition | 2000
Pat Corr; Darryl Stewart; Philip Hanna; Ji Ming; Francis Jack Smith
Although the discrete cosine transform (DCT) is widely used for feature extraction in pattern recognition, it is shown that it converges slowly for most theoretically smooth functions. A modification of the DCT is described, based on a change of variable, which changes it to a new transform, called the discrete Chebyshev transform (DChT), which converges very rapidly for the same smooth functions. Although this rapid convergence is largely destroyed by the noise in real experimental data, the discrete Chebyshev transform is still generally better than the DCT when the sampling of the data can be selected at nonequidistant points. The improvement over the DCT gives a theoretical explanation for improved speech recognition obtained using Mel feature cepstral coefficients. These choose the sampling frequencies of a DCT to correspond to the human perception of pitch. It is shown that this sampling is similar to the sampling used in the discrete Chebyshev transform.
integrating technology into computer science education | 2017
Aidan McGowan; Philip Hanna; Desmond Greer; John Busch
Much previous research has indicated that where a student sits in a university lecture theatre has a correlation with their final grade. Frequently those students that sit regularly in the front rows have been reported to achieve the highest grades. However most of the research restricted student seat movement, which is both unnatural and may have adversely influenced the research results. A previously reported unique unrestricted seat tracking investigation by the authors of this paper used a web and mobile software tracking application (PinPoint) to investigate student seating related performances in a 12 week Java programming university module. The PinPoint investigation concluded that the best assessment results were achieved by the students in the front rows and that assessment scores degraded the further students sat from the front. Additionally while the most engaged students were found to regularly sit at the front the same was not true for the most academically able or those with the greatest prior programming experience. This paper presents a further analysis of the PinPoint data, focusing on assessment performances within similar groups (academic ability, engagement and prior programming experiences) and additionally presents results of a temporal movement study and a qualitative analysis of the group and individual student seating decisions. It concludes that a comparison of student assessment performances within each of the peer groups, in every instance, found that the front row students outperformed their peers sitting further back. This strongly suggests that there is a benefit to sitting at the front regardless of academic ability, engagement or prior subject knowledge. It also points to other untested factors that may be positively influencing the front row performances.
Computer Science Education | 2015
Philip Hanna; Angela Allen; Russell Kane; Neil Anderson; Aidan McGowan; Matthew Collins; Malcolm Hutchison
This paper outlines a means of improving the employability skills of first-year university students through a closely integrated model of employer engagement within computer science modules. The outlined approach illustrates how employability skills, including communication, teamwork and time management skills, can be contextualised in a manner that directly relates to student learning but can still be linked forward into employment. The paper tests the premise that developing employability skills early within the curriculum will result in improved student engagement and learning within later modules. The paper concludes that embedding employer participation within first-year models can help relate a distant notion of employability into something of more immediate relevance in terms of how students can best approach learning. Further, by enhancing employability skills early within the curriculum, it becomes possible to improve academic attainment within later modules.