George S. Kang
United States Naval Research Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by George S. Kang.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989
George S. Kang; Larry J. Fransen
Numerous noise-suppression techniques have been developed for operating at the front end of low-bit-rate digital voice terminals. Some of these techniques have been evaluated by standardized intelligibility tests such as the diagnostic rhyme test (DRT). It is well known that the use of a noise suppressor seldom improves the DRT score even though listeners have had the impression that speech quality was enhanced. Unfortunately, noise suppressors have only occasionally been evaluated by standardized quality tests. The authors supplement quality test data for reference purposes. They use the diagnostic acceptability measure (DAM) to evaluate speech quality of the latest 2400-b/s linear-predictive coder (LPC) with a noise suppressor at the front end. They used a spectral subtraction technique for noise suppression. Ten different sets of noisy speech recorded at actual military platforms (such as a helicopter, tank, turboprop, helicopter carrier, or jeep) were input sources. The magnitude of the DAM improvement is substantial: as much as six points on the average, which is large enough to upgrade speech quality somewhat. >
international conference on acoustics, speech, and signal processing | 1985
George S. Kang; Lawrence J. Fransen
A low-bit-rate speech encoder must employ bit-saving measures to achieve intelligible and natural sounding synthesized speech. Some important measures are: (a) quantization of parameters based on their spectral-error sensitivities (i.e., coarser quantization for spectrally less sensitive parameters), and (b) quantization of parameters in accordance with properties of auditory perception (i.e., coarser quantization of the higher frequency components of the speech spectral envelope, and finer representation of spectral peaks than valleys). The use of Line-Spectrum Pairs (LSPs) makes it possible to employ these measures more readily than the better known reflection coefficients. As a result, the intelligibility of an LSP-based, pitch-excited vocoder operating at 800 bits/second (b/s) can be made as high as 87 for three male speakers (as measured by the Diagnostic Rhyme Test (DRT)) which is only 1.4 below that of the 2400-b/s LPC. Likewise, the intelligibility of a 4800-b/s nonpitch-excited vocoder is as high as 92.3 which compares favorably with scores from current 9600-b/s vocoders.
Journal of the Acoustical Society of America | 2000
George S. Kang; Lawrence J. Fransen
A system that synchronously segments a speech waveform using pitch period and a center of the pitch waveform. The pitch waveform center is determined by finding a local minimum of a centroid histogram waveform of the low-pass filtered speech waveform for one pitch period. The speech waveform can then be represented by one or more of such pitch waveforms or segments during speech compression, reconstruction or synthesis. The pitch waveform can be modified by frequency enhancement/filtering, waveform stretching/shrinking in speech synthesis or speech disguise. The utterance rate can also be controlled to speed up or slow down the speech.
IEEE Transactions on Circuits and Systems | 1984
W. Mikhael; F. Wu; George S. Kang; Lawrence J. Fransen
A new simple formulation for the choice of the optimum convergence factor \mu in adaptive filtering using gradient techniques is given. This leads to several optimum adaptive filtering algorithms each of which is optimum under the conditions it was derived. Several of these algorithms have been tested successfully proving their optimality and yielding faster and more accurate adaptation compared with the existing conventional algorithm CA that uses fixed or imperical \mu . Two algorithms are derived and examined here. The first, the Homogeneous Algorithm HA, results in a time varying \mu which is the same for each filter but is updated at each iteration to yield optimum performance. The second, the Individual Adaptation Algorithm (IAA), has a time varying \mu which is chosen suitably for each coefficient at each iteration. The performance of the HA and IAA always outperformed the CA. Computer simulations and experimental results are given which are in agreement with theory.
international conference on acoustics, speech, and signal processing | 1984
George S. Kang; Mark L. Lidd
The purpose of an automatic gain control (AGC) is to self-adjust the front-end gain of the LPC analyzer in such a way that the speech waveform is more accurately quantized by the analog-to-digital converter. Tests in the past have indicated that properly amplified speech produces higher intelligibility scores at the narrowband LPC output because both filter and excitation parameters are more accurately estimated. In addition, properly amplified input speech results in properly amplified speech at the receiver, which is highly desirable for listening in a noisy environment.
Journal of the Acoustical Society of America | 1996
George S. Kang; Lawrence J. Fransen
A voice communication processing system and method for processing a speech waveform as a digital bit stream having a reduced number of bits representing speech parameters. The bit representation of amplitude parameters is reduced by storing only probable amplitude parameter transitions corresponding to amplitude parameter indices in an amplitude table and by joint encoding the amplitude parameter indices over multiple frames. The bit representation of the pitch period is reduced by storing a range of pitch periods in a pitch table and by joint encoding pitch period indices corresponding to an average pitch period over two frames. The bit representation of the vocal tract filter coefficients is reduced by storing only probable filter coefficient transitions corresponding to filter coefficient indices in a filter coefficient table and by joint encoding the filter coefficient indices over two frames. Voicing decisions are inferred by an associated vocal tract filter coefficient index obtained by searching the filter coefficient table where the table is divided according to the voicing decisions, and thus separate voicing decisions do not have to be transmitted. By providing a reduced bit representation of the various speech parameters as explained above, the present invention processes the speech waveform at a more efficient data rate. In addition, the present invention converts prediction coefficients (PCs) into line spectra pairs (LSPs) to be used as filter parameters when performing a linear predictive coder (LPC) analysis. Thus, by using LSPs, the present invention is able to more efficiently encode and decode speech.
IEEE Transactions on Circuits and Systems | 1987
George S. Kang; Lawrence J. Fransen
The presence of noise in speech has severely adverse effects on speech produced by a low-bit-rate voice terminal. This paper describes an adaptive noise-cancellation filter that was designed to suppress the broad-band, nonstationary, and intense noise often encountered in military tracked vehicles, helicopters, and high-performance aircraft. This adaptive noise-cancellation filter was developed for real-time operation using a TMS32010 microprocessor. The filter was tested under various environmental conditions with a wide range of filter parameters. According to our measurements, it reduces the noise floor by 10 to 15 dB without degrading the voice quality.
international conference on acoustics speech and signal processing | 1998
David A. Heide; George S. Kang
Throughout the history of telecommunication, speech has rarely been transmitted with its full analog bandwidth (0 to 8 kHz or more) due to limitations in channel bandwidth. This impaired legacy continues with tactical voice communication. The passband of a voice terminal is typically 0 to 4 kHz. Hence, high-frequency speech components (4 to 8 kHz) are removed prior to transmission. As a result, speech intelligibility suffers, particularly for low-data-rate vocoders. In this paper, we describe our speech-processing technique, which permits some of the upperband speech components to be translated into the passband of the vocoder. According to our test results, speech intelligibility is improved by as much as three to four points even for the Department of Defense-standard mixed excitation linear predictor (MELP) 2.4 kb/s vocoder. Note that speech intelligibility is improved without expanding the transmission bandwidth or compromising interoperability with others.
international conference on acoustics, speech, and signal processing | 1976
George S. Kang; David C. Coulter
This paper presents an analysis/synthesis method whereby speech may be transmitted at 600 bps, a data rate which is less than 1 percent of the PCM transmission rate for original speech sounds. This R&D effort was motivated by the pressing need for very-low-data rate (VLDR) voice digitizers to meet some of the current military voice communication requirements. The use of a VLDR voice digitizer makes it possible to transmit speech signals over adverse channels which support data rates of only a few hundred bps, or to transmit speech signals over more favorable channels with redundancies for error protection and other useful applications. The 600 bps synthesized speech loses some of its original speech quality, but the intelligibility is sufficiently high to permit the use of the system in certain specialized military applications. One of the most attractive features of the 600 bps voice digitizer is that it is a simple extension of the 2400 bps linear predictive encoder (LPE) which has been under intensive investigation by various government agencies, including the Navy, and is presently entering advanced development. In essence, the 600 bps voice digitizer is a combination of an LPE and a format vocoder, which is realized by adding a processor to the existing 2400 bps LPE. This add-on processor converts the 2400 bps speech data to 600 bps speech data at the transmitter, and reconverts the data to 2400 bps at the receiver.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1987
George S. Kang; Lawrence J. Fransen
Line-spectrum pairs (LSPs) are frequency-domain parameters similar to formant frequencies. Thus, they have frequency-selective spectral-error characteristics which allow LSP quantization in accordance with auditory perception. In addition, ease of estimating the spectral-error sensitivity of each line spectrum makes possible encoding each line spectrum efficiently. This correspondence, for the first time, demonstrates that a 31 bit representation of LSPs provides similar intelligibility as a 41 bit representation of reflection coefficients in a current 2400 bit/s LPC. Even with a 12 bit quantization of LSPs, the loss of speech intelligibility is minor, only 2.4 points below that of a 41 bit quantization of reflection coefficients as measured by the diagnostic rhyme test (DRT) which tests initial-consonant discrimination.