Gerald Penn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gerald Penn is active.

Explore More

Publication

Featured researches published by Gerald Penn.

IEEE Transactions on Audio, Speech, and Language Processing | 2014

Convolutional neural networks for speech recognition

Ossama Abdel-Hamid; Abdel-rahman Mohamed; Hui Jiang; Li Deng; Gerald Penn; Dong Yu

Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.

IEEE Transactions on Visualization and Computer Graphics | 2009

Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations

Christopher Collins; Gerald Penn; M. Sheelagh T. Carpendale

While many data sets contain multiple relationships, depicting more than one data relationship within a single visualization is challenging. We introduce Bubble Sets as a visualization technique for data that has both a primary data relation with a semantically significant spatial organization and a significant set membership relation in which members of the same set are not necessarily adjacent in the primary layout. In order to maintain the spatial rights of the primary data relation, we avoid layout adjustment techniques that improve set cluster continuity and density. Instead, we use a continuous, possibly concave, isocontour to delineate set membership, without disrupting the primary layout. Optimizations minimize cluster overlap and provide for calculation of the isocontours at interactive speeds. Case studies show how this technique can be used to indicate multiple sets on a variety of common visualizations.

international conference on acoustics, speech, and signal processing | 2012

Understanding how Deep Belief Networks perform acoustic modelling

Abdel-rahman Mohamed; Geoffrey E. Hinton; Gerald Penn

Deep Belief Networks (DBNs) are a very competitive alternative to Gaussian mixture models for relating states of a hidden Markov model to frames of coefficients derived from the acoustic input. They are competitive for three reasons: DBNs can be fine-tuned as neural networks; DBNs have many non-linear hidden layers; and DBNs are generatively pre-trained. This paper illustrates how each of these three aspects contributes to the DBNs good recognition performance using both phone recognition performance on the TIMIT corpus and a dimensionally reduced visualization of the relationships between the feature vectors learned by the DBNs that preserves the similarity structure of the feature vectors at multiple scales. The same two methods are also used to investigate the most suitable type of input representation for a DBN.

ieee vgtc conference on visualization | 2009

Docuburst: visualizing document content using language structure

Christopher Collins; M. Sheelagh T. Carpendale; Gerald Penn

Textual data is at the forefront of information management problems today. One response has been the development of visualizations of text data. These visualizations, commonly based on simple attributes such as relative word frequency, have become increasingly popular tools. We extend this direction, presenting the first visualization of document content which combines word frequency with the human‐created structure in lexical databases to create a visualization that also reflects semantic content. DocuBurst is a radial, space‐filling layout of hyponymy (the IS‐A relation), overlaid with occurrence counts of words in a document of interest to provide visual summaries at varying levels of granularity. Interactive document analysis is supported with geometric and semantic zoom, selectable focus on individual words, and linked access to source text.

Studia Logica | 1989

Categorial grammars determined from linguistic data by unification

Wojciech Buszkowski; Gerald Penn

We provide an algorithm for determining a categorial grammar from linguistic data that essentially uses unification of type-schemes assigned to atoms. The algorithm presented here extends an earlier one restricted to rigid categorial grammars, introduced in [4] and [5], by admitting non-rigid outputs. The key innovation is the notion of an optimal unifier, a natural generalization of that of a most general unifier.

human factors in computing systems | 2006

The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives

Cosmin Munteanu; Ronald M. Baecker; Gerald Penn; Elaine G. Toms; David F. James

The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25% Word Error Rate (WER), transcripts with 45% WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45% WER are unsatisfactory, and suggests that transcripts having a WER of 25% or less would be useful and usable in webcast archives.

meeting of the association for computational linguistics | 2002

A Web-based Instructional Platform for Contraint-Based Grammar Formalisms and Parsing

W. Detmar Meurers; Gerald Penn; Frank Richter

We propose the creation of a web-based training framework comprising a set of topics that revolve around the use of feature structures as the core data structure in linguistic theory, its formal foundations, and its use in syntactic processing.

ieee vgtc conference on visualization | 2007

Visualization of uncertainty in lattices to support decision-making

Christopher Collins; M. Sheelagh T. Carpendale; Gerald Penn

Lattice graphs are used as underlying data structures in many statistical processing systems, including natural language processing. Lattices compactly represent multiple possible outputs and are usually hidden from users. We present a novel visualization intended to reveal the uncertainty and variability inherent in statistically-derived lattice structures. Applications such as machine translation and automated speech recognition typically present users with a best-guess about the appropriate output, with apparent complete confidence. Through case studies we show how our visualization uses a hybrid layout along with varying transparency, colour, and size to reveal the lattice structure, expose the inherent uncertainty in statistical processing, and help users make better-informed decisions about statistically-derived outputs.

international joint conference on natural language processing | 2009

Summarizing multiple spoken documents: finding evidence from untranscribed audio

Xiaodan Zhu; Gerald Penn; Frank Rudzicz

This paper presents a model for summarizing multiple untranscribed spoken documents. Without assuming the availability of transcripts, the model modifies a recently proposed unsupervised algorithm to detect re-occurring acoustic patterns in speech and uses them to estimate similarities between utterances, which are in turn used to identify salient utterances and remove redundancies. This model is of interest due to its independence from spoken language transcription, an error-prone and resource-intensive process, its ability to integrate multiple sources of information on the same topic, and its novel use of acoustic patterns that extends previous work on low-level prosodic feature detection. We compare the performance of this model with that achieved using manual and automatic transcripts, and find that this new approach is roughly equivalent to having access to ASR transcripts with word error rates in the 33--37% range without actually having to do the ASR, plus it better handles utterances with out-of-vocabulary words.

human factors in computing systems | 2008

Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts

Cosmin Munteanu; Ronald M. Baecker; Gerald Penn

One challenge in facilitating skimming or browsing through archives of on-line recordings of webcast lectures is the lack of text transcripts of the recorded lecture. Ideally, transcripts would be obtainable through Automatic Speech Recognition (ASR). However, current ASR systems can only deliver, in realistic lecture conditions, a Word Error Rate of around 45% -- above the accepted threshold of 25%. In this paper, we present the iterative design of a webcast extension that engages users to collaborate in a wiki-like manner on editing the ASR-produced imperfect transcripts, and show that this is a feasible solution for improving the quality of lecture transcripts. We also present the findings of a field study carried out in a real lecture environment investigating how students use and edit the transcripts.

Explore More