François G. Germain
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by François G. Germain.
IEEE Signal Processing Letters | 2014
François G. Germain; Gautham J. Mysore
Numerous audio signal processing and analysis techniques using non-negative matrix factorization (NMF) have been developed in the past decade, particularly for the task of source separation. NMF-based algorithms iteratively optimize a cost function. However, the correlation between cost functions and application-dependent performance metrics is less known. Furthermore, to the best of our knowledge, no formal heuristic to compute a stopping criterion tailored to a given application exists in the literature. In this paper, we examine this problem for the case of supervised and semi-supervised NMF-based source separation and show that iterating these algorithms to convergence is not optimal for this application. We propose several heuristic stopping criteria that we empirically found to be well correlated with source separation performance. Moreover, our results suggest that simply integrating the learning of an appropriate stopping criterion in a sweep for model size selection could lead to substantial performance improvements with minimal additional effort.
international conference on acoustics, speech, and signal processing | 2016
François G. Germain; Gautham J. Mysore; Takako Fujioka
When different parts of speech content such as voice-overs and narration are recorded in real-world environments with different acoustic properties and background noise, the difference in sound quality between the recordings is typically quite audible and therefore undesirable. We propose an algorithm to equalize multiple such speech recordings so that they sound like they were recorded in the same environment. As the timbral content of the speech and background noise typically differ considerably, a simple equalization matching results in a noticeable mismatch in the output signals. A single equalization filter affects both timbres equally and thus cannot disambiguate the competing matching equations of each source. We propose leveraging speech enhancement methods in order to separate speech and background noise, independently apply equalization filtering to each source, and recombine the outputs. By independently equalizing the separated sources, our method is able to better disambiguate the matching equations associated with each source. Therefore the resulting matched signals are perceptually very similar. Additionally, by retaining the background noise in the final output signals, most artifacts from speech enhancement methods are considerably reduced and in general perceptually masked. Subjective listening tests show that our approach significantly outperforms simple equalization matching.
workshop on applications of signal processing to audio and acoustics | 2009
François G. Germain; Gianpaolo Evangelista
In this paper, we provide a model of the plectrum, or guitar pick, for use in physically inspired sound synthesis. The model draws from the mechanics of beams. The profile of the plectrum is computed in real time based on its interaction with the string, which depends on the movement impressed by the player and the equilibrium of dynamical forces. A condition for the release of the string is derived, which allows to drive the digital waveguide simulating the string to the proper state at release time. The acoustic results are excellent, as verified in the sound examples provided.
international conference on acoustics, speech, and signal processing | 1979
Jean-François Abramatic; François G. Germain; Emmanuel Rosencher
The design of 2-D recursive filters with separable denominator transfer functions from the impulse response of a prototype filter is performed in two steps. First the poles of the transfer function of the recursive filter are found using an LMS criterion through an iterative scheme. According to the same criterion, the coefficients of the numerator may then be found by solving linear systems. For prototype filters with real transfer functions, the numerator can also be adjusted by minimizing a Chebyshev criterion.
international conference on acoustics, speech, and signal processing | 2015
François G. Germain; Gautham J. Mysore
Desirable properties of real-world speech enhancement methods include online operation, single-channel operation, operation in the presence of a variety of noise types including non-stationary noise, and no requirement for isolated training examples of the specific speaker and noise type at hand. Methods in the literature typically possess only a subset of these properties. Source separation methods particularly rarely simultaneously possess the first and last properties. We extend universal speech model-based speech enhancement to adaptively learn a noise model in an online fashion. We learn a model from a general corpus of speech in place of speaker-dependent training examples before deployment. This setup provides all of these desirable properties, making it easy to deploy in real-world systems without the need to provide additional training examples, while explicitly modeling speech. Our experimental results show that our method achieves the same performance as in the case in which speaker-dependent training data is available.
electronic imaging | 2015
François G. Germain; Iretiayo A. Akinola; Qiyuan Tian; Steven Lansel; Brian A. Wandell
To speed the development of novel camera architectures we proposed a method, L3 (Local, Linear and Learned),that automatically creates an optimized image processing pipeline. The L3 method assigns each sensor pixel into one of 400 classes, and applies class-dependent local linear transforms that map the sensor data from a pixel and its neighbors into the target output (e.g., CIE XYZ rendered under a D65 illuminant). The transforms are precomputed from training data and stored in a table used for image rendering. The training data are generated by camera simulation, consisting of sensor responses and rendered CIE XYZ outputs. The sensor and rendering illuminant can be equal (same-illuminant table) or different (cross-illuminant table). In the original implementation, illuminant correction is achieved with cross-illuminant tables, and one table is required for each illuminant. We find, however, that a single same-illuminant table (D65) effectively converts sensor data for many different same-illuminant conditions. Hence, we propose to render the data by applying the same-illuminant D65 table to the sensor data, followed by a linear illuminant correction transform. The mean color reproduction error using the same-illuminant table is on the order of 4▵E units, which is only slightly larger than the cross-illuminant table error. This approach reduces table storage requirements significantly without substantially degrading color reproduction accuracy.
workshop on applications of signal processing to audio and acoustics | 2017
François G. Germain; Kurt James Werner
One goal of Virtual Analog modeling of audio circuits is to produce digital models whose behavior matches analog prototypes as closely as possible. Discretization methods provide a systematic approach to generate such models but they introduce frequency response error, such as frequency warping for the trapezoidal method. Recent work showed how using different discretization methods for each reactive element could reduce such error for driving point transfer functions. It further provided a procedure to optimize that error according to a chosen metric through joint selection of the discretization parameters. Here, we extend that approach to the general case of transfer functions with one input and an arbitrary number of outputs expressed as linear combinations of the network variables, and we consider error metrics based on the L2 and the L1 norms. To demonstrate the validity of our approach, we apply the optimization procedure for the response of a Hammond organ vibrato/chorus ladder filter, a 19-output, 36th order filter, where each output frequency response presents many features spread across its passband.
conference of the international speech communication association | 2013
François G. Germain; Dennis L. Sun; Gautham J. Mysore
19th International Conference on Digital Audio Effects | 2016
Kurt James Werner; W. Ross Dunkel; François G. Germain
international symposium/conference on music information retrieval | 2013
Zafar Rafii; François G. Germain; Dennis L. Sun; Gautham J. Mysore