Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tony S. Verma is active.

Publication


Featured researches published by Tony S. Verma.


international conference on acoustics speech and signal processing | 1999

Sinusoidal modeling using frame-based perceptually weighted matching pursuits

Tony S. Verma; Teresa H. Meng

We propose a method for sinusoidal modeling that takes into account the psychoacoustics of human hearing using a frame-based perceptually weighted matching pursuit. Working on blocks of the input signal, a set of sinusoidal components for each block is iteratively extracted taking into consideration perceptual significance by using extensions to the well known matching pursuits algorithm. These extensions allow including information about the time-varying masking threshold of the input signal during the pursuit. The blocks overlap-add together to reconstruct the entire signal. Although the perceptually weighted matching pursuit on each block can iterate until the error between the original and the reconstructed signal is zero, lower order approximations are possible by stopping the pursuit when the error becomes imperceptible to the human ear or by stopping the pursuit after a number of the perceptually most significant sinusoidal elements are found. The proposed sinusoidal model finds use in many applications including signal modifications and compression.


Computer Music Journal | 2000

Extending Spectral Modeling Synthesis with Transient Modeling Synthesis

Tony S. Verma; Teresa H. Meng

Sinusoidal modeling has enjoyed a rich history in both speech and music applications, including sound transformations, compression, denoising, and auditory scene analysis. For such applications, the underlying signal model must efficiently capture salient audio features (Goodwin 1998). In this article, we present an accurate, efficient, and flexible three-part model for audio signals consisting of sines, transients, and noise by extending spectral modeling synthesis (SMS) (Serra and Smith 1990) with an explicit flexible transient model called transient-modeling synthesis (TMS). The sinusoidal transformation system (STS) (McAulay and Quatieri 1986) and SMS find the slowly varying sinusoidal components in a signal using spectral-peak-picking algorithms. Subtracting the synthesized sinusoids from the original signal creates a residual consisting of transients and noise (Serra 1989; George and Smith 1992). However, sinusoids do not model this residual well. Although it is possible to model transients and noise by a sum of sinusoidal signals (as with the Fourier transform), it is neither efficient, because transient and noisy signals require many sinusoids for their description, nor meaningful, because transients are short-lived signals, while the sinusoidal model uses sinusoids that are active on a much larger time scale. In the STS system (generally applied to speech), the transient + noise residual is often masked sufficiently to be ignored (McAulay and Quatieri 1986). In music applications, this residual is often important to the integrity of the signal. The SMS system extends the sinusoidal model by explicitly modeling the residual as slowly filtered white noise. Although this technique has been very successful, transients do not fit well into this model, because transients modeled as filtered noise lose sharpness in their attack and tend to sound dull. Because transients are


international conference on acoustics speech and signal processing | 1998

An analysis/synthesis tool for transient signals that allows a flexible sines+transients+noise model for audio

Tony S. Verma; Teresa H. Meng

We present a flexible analysis/synthesis tool for transient signals that extends current sinusoidal and sines+noise models for audio to sines+transients+noise. The explicit handling of transients provides a more realistic and robust signal model. Because the transient model presented is the frequency domain dual to sinusoidal modeling, it has similar flexibility and allows for a wide range of transformations on the parameterized signal. In addition, due to this duality, a major portion of the transient model is sinusoidal modeling performed in the frequency domain. In order to make the transient and sinusoidal models work more effectively together, we present a formulation of sinusoidal modeling (and therefore transient modeling) in terms of matching pursuits and overlap-add synthesis. This formulation provides a tight coupling between the sines+transients+noise model because it allows a simple heuristic, based on tonality, as to when an audio signal should be modeled as sines and/or transients and/or noise.


international conference on acoustics speech and signal processing | 1998

Multiresolution sinusoidal modeling for wideband audio with modifications

Scott N. Levine; Tony S. Verma; Julius O. Smith

We describe a computationally efficient method of generating more accurate sinusoidal parameters {amplitude, frequency, phase} from a wideband polyphonic audio source in a multiresolution, non-aliased fashion. This significantly improves upon previous work of sinusoidal modeling that assumes a single-pitched monophonic source, such as speech or an individual musical instrument, while using approximately the same number of sinusoids. In addition to a more general analysis, we can now perform high-quality modifications such as time-stretching and pitch-shifting on polyphonic audio with ease.


international conference on acoustics, speech, and signal processing | 2000

A 6Kbps to 85Kbps scalable audio coder

Tony S. Verma; Teresa H. Meng

Scalable audio coding is important in network environments, such as the Internet, where bandwidth is not guaranteed, packet loss is common, and client connection data rates are heterogeneous. Signal models provide a general framework for attacking a wide range of challenges in the unicast delivery of real-time audio over packet switched networks. The specific signal model in this work generates a parametric representation for general wide-band audio signals. The model consists of three complementary components: sines, transients, and noise. Because the human hearing system ultimately judges the validity of a model for audio signals, psychoacoustic principles are explicitly considered in the three part model. Once analyzed, the parameters are quantized, compressed and packed into a single 85Kbps bit-stream. From this bit-stream, bit-streams at several bit-rates between 6Kbps and 85Kbps may be readily extracted. The audio coder offers a wide range of scalability while the audio quality of the coding scheme gracefully degrades from perceptually lossless to low-quality.


international conference on acoustics speech and signal processing | 1996

The digital prolate spheroidal window

Tony S. Verma; Stefan Bilbao; Teresa H. Meng

The optimal window, the time limited sequence whose energy is most concentrated in a finite frequency interval, is related to a particular discrete prolate spheroidal sequence. The optimal window is actually a family of windows with many degrees of freedom. The Kaiser (1974) window is an approximation to this optimal window. Kaiser used this approximation because the standard method employed to compute the optimal window is numerically ill-conditioned. We show the actual optimal window can be efficiently computed by using an alternative formulation of the discrete prolate spheroidal sequences. We then give a set of design formulas to generate the optimal window for the desired window length, mainlobe width, and relative peak sidelobe height.


Journal of New Music Research | 2002

Perception and Adjustment of Pitch in Inharmonic String Instrument Tones

Hanna Järveläinen; Tony S. Verma; Vesa Välimäki

The effect of inharmonicity on pitch was measured by listening tests at five fundamental frequencies. Inharmonicity was defined in a way typical of string instruments, such as the piano, where all partials are elevated in a systematic way. It was found that the pitch judgment is usually dominated by some other partial than the fundamental; however, with a high degree of inharmonicity the fundamental became important as well. Guidelines are given for compensating for the pitch difference between harmonic and inharmonic tones in digital sound synthesis.


data compression conference | 1998

A scalable entropy code

Tony S. Verma; Teresa H. Meng

Summary form only given. We present an algorithm for constructing entropy codes that allow progressive transmission. The algorithm constructs codes by forming an unbalanced tree in a similar to fashion to Huffman coding. It differs, however, in that nodes are combined in a rate-distortion sense. Because nodes are formed with both rate and distortion in mind, each internal tree node, in addition to each leaf node, has a reconstruction vector and a path map, or codeword, associated with it. The code associated with the leaf nodes is a lossless, asymptotically optimal (for many sources), prefix code. The codes associated with internal nodes are lossy prefix codes, but have lower average length than the lossless code. Using codes associated with the tree and pruned subtrees, an encoded source can be reconstructed with higher fidelity as more bits become available therefore allowing a successive approximation character. In addition, because the lossless code is asymptotically optimal for many sources, the the cost of using the lossless progressive code can be made arbitrarily small for these sources.


Journal of the Acoustical Society of America | 1998

A flexible analysis/synthesis tool for transient signals

Tony S. Verma; Teresa H. Meng

An overview of a flexible analysis/synthesis tool for transient signals that effectively extends the spectral modeling synthesis (SMS) parametrization of signals from sinusoids+noise to sinusoids+transients+noise is given. The extended model, by explicitly handling transients, provides a more realistic and robust signal analysis/transformation/synthesis tool. Although others have pointed out the need for partitioning SMS into sinusoids+transients+noise, they have not provided a flexible model for transients. The need for an explicit transient model arises because SMS analyzes a signal by first modeling the sinusoidal components of the signal. SMS then subtracts these sinusoidal components from the original signal leaving a residual signal that contains transients+noise. One drawback of SMS is it models this transient+noise residual solely as filtered noise. Our transient model, when used on this residual, first detects where possible transients occur, models the transients, then removes transients from th...


Archive | 1998

System and method for multiresolution scalable audio signal encoding

Scott N. Levine; Tony S. Verma

Collaboration


Dive into the Tony S. Verma's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hanna Järveläinen

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge