Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Philip J. B. Jackson is active.

Publication


Featured researches published by Philip J. B. Jackson.


IEEE Transactions on Speech and Audio Processing | 2001

Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech

Philip J. B. Jackson; Christine H. Shadle

Almost all speech contains simultaneous contributions from more than one acoustic source within the speakers vocal tract. In this paper, we propose a method-the pitch-scaled harmonic filter (PSHF)-which aims to separate the voiced and turbulence-noise components of the speech signal during phonation, based on a maximum likelihood approach. The PSHF outputs periodic and aperiodic components that are estimates of the respective contributions of the different types of acoustic source. It produces four reconstructed time series signals by decomposing the original speech signal, first, according to amplitude, and then according to power of the Fourier coefficients. Thus, one pair of periodic and aperiodic signals is optimized for subsequent time-series analysis, and another pair for spectral analysis. The performance of the PSHF algorithm is tested on synthetic signals, using three forms of disturbance (jitter, shimmer and additive noise), and the results were used to predict the performance on real speech. Processing recorded speech examples elicited latent features from the signals, demonstrating the PSHFs potential for analysis of mixed-source speech.


international conference on computer graphics and interactive techniques | 2012

Physical face cloning

Bernd Bickel; Peter Kaufmann; Mélina Skouras; Bernhard Thomaszewski; Derek Bradley; Thabo Beeler; Philip J. B. Jackson; Steve Marschner; Wojciech Matusik; Markus H. Gross

We propose a complete process for designing, simulating, and fabricating synthetic skin for an animatronics character that mimics the face of a given subject and its expressions. The process starts with measuring the elastic properties of a material used to manufacture synthetic soft tissue. Given these measurements we use physics-based simulation to predict the behavior of a face when it is driven by the underlying robotic actuation. Next, we capture 3D facial expressions for a given target subject. As the key component of our process, we present a novel optimization scheme that determines the shape of the synthetic skin as well as the actuation parameters that provide the best match to the target expressions. We demonstrate this computational skin design by physically cloning a real human face onto an animatronics figure.


international symposium on 3d data processing visualization and transmission | 2004

Speech-driven face synthesis from 3D video

Ioannis A. Ypsilos; Adrian Hilton; Aseel Turkmani; Philip J. B. Jackson

We present a framework for speech-driven synthesis of real faces from a corpus of 3D video of a person speaking. Video-rate capture of dynamic 3D face shape and colour appearance provides the basis for a visual speech synthesis model. A displacement map representation combines face shape and colour into a 3D video. This representation is used to efficiently register and integrate shape and colour information captured from multiple views. To allow visual speech synthesis viseme primitives are identified from the corpus using automatic speech recognition. A novel nonrigid alignment algorithm is introduced to estimate dense correspondence between 3D face shape and appearance for different visemes. The registered displacement map representation together with a novel optical flow optimisation using both shape and colour, enables accurate and efficient nonrigid alignment. Face synthesis from speech is performed by concatenation of the corresponding viseme sequence using the nonrigid correspondence to reproduce both 3D face shape and colour appearance. Concatenative synthesis reproduces both viseme timing and co-articulation. Face capture and synthesis has been performed for a database of 51 people. Results demonstrate synthesis of 3D visual speech animation with a quality comparable to the captured video of a person.


Journal of the Acoustical Society of America | 2014

Acoustic contrast, planarity and robustness of sound zone methods using a circular loudspeaker array.

Philip Coleman; Philip J. B. Jackson; Marek Olik; Martin Møller; Martin Olsen; Jan Abildgaard Pedersen

Since the mid 1990s, acoustics research has been undertaken relating to the sound zone problem-using loudspeakers to deliver a region of high sound pressure while simultaneously creating an area where the sound is suppressed-in order to facilitate independent listening within the same acoustic enclosure. The published solutions to the sound zone problem are derived from areas such as wave field synthesis and beamforming. However, the properties of such methods differ and performance tends to be compared against similar approaches. In this study, the suitability of energy focusing, energy cancelation, and synthesis approaches for sound zone reproduction is investigated. Anechoic simulations based on two zones surrounded by a circular array show each of the methods to have a characteristic performance, quantified in terms of acoustic contrast, array control effort and target sound field planarity. Regularization is shown to have a significant effect on the array effort and achieved acoustic contrast, particularly when mismatched conditions are considered between calculation of the source weights and their application to the system.


Computer Speech & Language | 2005

A multiple-level linear/linear segmental HMM with a formant-based intermediate layer

Martin J. Russell; Philip J. B. Jackson

Abstract A novel multi-level segmented HMM (MSHMM) is presented in which the relationship between symbolic (phonetic) and surface (acoustic) representations of speech is regulated by an intermediate ‘articulatory’ representation. Speech dynamics are characterised as linear trajectories in the articulatory space, which are transformed into the acoustic space using an articulatory-to-acoustic mapping. Recognition is then performed. The results of phonetic classification experiments are presented for monophone and triphone MSHMMs using three formant-based ‘articulatory’ parameterisations and sets of between 1 and 49 linear articulatory-to-acoustic mappings. The NIST Matched Pair Sentence Segment (Word Error) test shows that, for a sufficiently rich combination of articulatory parameterisation and mappings, differences between these results and those obtained with an optimal classifier are not statistically significant. It is also shown that, compared with a conventional HMM, superior performance can be achieved using a MSHMM with 25% fewer parameters.


Journal of the Acoustical Society of America | 2014

Personal audio with a planar bright zone

Philip Coleman; Philip J. B. Jackson; Marek Olik; Jan Abildgaard Pedersen

Reproduction of multiple sound zones, in which personal audio programs may be consumed without the need for headphones, is an active topic in acoustical signal processing. Many approaches to sound zone reproduction do not consider control of the bright zone phase, which may lead to self-cancellation problems if the loudspeakers surround the zones. Conversely, control of the phase in a least-squares sense comes at a cost of decreased level difference between the zones and frequency range of cancellation. Single-zone approaches have considered plane wave reproduction by focusing the sound energy in to a point in the wavenumber domain. In this article, a planar bright zone is reproduced via planarity control, which constrains the bright zone energy to impinge from a narrow range of angles via projection in to a spatial domain. Simulation results using a circular array surrounding two zones show the method to produce superior contrast to the least-squares approach, and superior planarity to the contrast maximization approach. Practical performance measurements obtained in an acoustically treated room verify the conclusions drawn under free-field conditions.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Joint mixing vector and binaural model based stereo source separation

Atiyeh Alinaghi; Philip J. B. Jackson; Qingju Liu; Wenwu Wang

In this paper the mixing vector (MV) in the statistical mixing model is compared to the binaural cues represented by interaural level and phase differences (ILD and IPD). It is shown that the MV distributions are quite distinct while binaural models overlap when the sources are close to each other. On the other hand, the binaural cues are more robust to high reverberation than MV models. According to this complementary behavior we introduce a new robust algorithm for stereo speech separation which considers both additive and convolutive noise signals to model the MV and binaural cues in parallel and estimate probabilistic time-frequency masks. The contribution of each cue to the final decision is also adjusted by weighting the log-likelihoods of the cues empirically. Furthermore, the permutation problem of the frequency domain blind source separation (BSS) is addressed by initializing the MVs based on binaural cues. Experiments are performed systematically on determined and underdetermined speech mixtures in five rooms with various acoustic properties including anechoic, highly reverberant, and spatially-diffuse noise conditions. The results in terms of signal-to-distortion-ratio (SDR) confirm the benefits of integrating the MV and binaural cues, as compared with two state-of-the-art baseline algorithms which only use MV or the binaural cues.


international conference on computer graphics and interactive techniques | 2013

Fabricating translucent materials using continuous pigment mixtures

Marios Papas; Christian Regg; Wojciech Jarosz; Bernd Bickel; Philip J. B. Jackson; Wojciech Matusik; Steve Marschner; Markus H. Gross

We present a method for practical physical reproduction and design of homogeneous materials with desired subsurface scattering. Our process uses a collection of different pigments that can be suspended in a clear base material. Our goal is to determine pigment concentrations that best reproduce the appearance and subsurface scattering of a given target material. In order to achieve this task we first fabricate a collection of material samples composed of known mixtures of the available pigments with the base material. We then acquire their reflectance profiles using a custom-built measurement device. We use the same device to measure the reflectance profile of a target material. Based on the database of mappings from pigment concentrations to reflectance profiles, we use an optimization process to compute the concentration of pigments to best replicate the target material appearance. We demonstrate the practicality of our method by reproducing a variety of different translucent materials. We also present a tool that allows the user to explore the range of achievable appearances for a given set of pigments.


Journal of the Acoustical Society of America | 2000

Frication noise modulated by voicing, as revealed by pitch-scaled decomposition

Philip J. B. Jackson; Christine H. Shadle

A decomposition algorithm that uses a pitch-scaled harmonic filter was evaluated using synthetic signals and applied to mixed-source speech, spoken by three subjects, to separate the voiced and unvoiced parts. Pulsing of the noise component was observed in voiced frication, which was analyzed by complex demodulation of the signal envelope. The timing of the pulsation, represented by the phase of the anharmonic modulation coefficient, showed a step change during a vowel-fricative transition corresponding to the change in location of the noise source within the vocal tract. Analysis of fricatives [see text] demonstrated a relationship between steady-state phase and place, and f0 glides confirmed that the main cause was a place-dependent delay.


Speech Communication | 2009

Statistical identification of articulation constraints in the production of speech

Philip J. B. Jackson; Veena D. Singampalli

We present a statistical technique for identifying critical, dependent and redundant roles played by the articulators during production of English phonemes using articulatory (EMA) data. It identifies a list of critical articulators for each phone based on changes in the distribution of articulator positions. The effect of critical articulation on dependent articulators is derived from inter-articulator correlation. Articulators unaffected or not correlated with the critical articulators are regarded as redundant. The technique was implemented on 1D and 2D distributions of midsagittal articulator coordinates, and the results of this data-driven approach are analyzed in comparison with the phonetic descriptions from the IPA chart. The results using the proposed method gave a closer fit to measured data than those estimated from IPA information alone and highlighted significant factors in the phoneme-to-phone transformation. The proposed algorithm was evaluated against an exhaustive search of critical articulators, and found to be as effective as the exhaustive search in modeling phone distributions with the added advantage of faster execution times. The efficiency of the approach in generating a parsimonious yet accurate representation of the observed articulatory constraints is described, and its potential for applications in speech science and technology discussed.

Collaboration


Dive into the Philip J. B. Jackson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge