Richard V. Cox
AT&T Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Richard V. Cox.
IEEE Journal on Selected Areas in Communications | 1992
Juin-Hwey Chen; Richard V. Cox; Yen-Chun Lin; Nikil S. Jayant; Melvin J. Melchner
A low-delay code-excited linear prediction (LD-CELP) speech coder which is expected to be standardized in 1992 as a CCITT G Series Recommendation for universal applications of speech coding at 16 kb/s is presented. The coder achieves a one-way coding delay of less than 2 ms by making both the LPC predictor and the excitation gain backward-adaptive and by using a small excitation vector size of five samples. The official CCITT laboratory tests revealed that the speech quality of this 16 kb/s LD-CELP coder is either equivalent to or better than that of the CCITT G.721 standard 32-kb/s ADPCM coder for almost all conditions tested. A description of the LD-CELP algorithm, its implementation on the DSP32C for CCITT testing, and performance results from these tests are presented. >
international conference on acoustics speech and signal processing | 1999
David Malah; Richard V. Cox; Anthony J. Accardi
Speech enhancement algorithms which are based on estimating the short-time spectral amplitude of the clean speech have better performance when a soft-decision gain modification, depending on the a priori probability of speech absence, is used. In reported works a fixed probability, q, is assumed. Since speech is non-stationary and may not be present in every frequency bin when voiced, we propose a method for estimating distinct values of q for different bins which are tracked in time. The estimation is based on a decision-theoretic approach for setting a threshold in each bin followed by short-time averaging. The estimated qs are used to control both the gain and the update of the estimated noise spectrum during speech presence in a modified MMSE log-spectral amplitude estimator. Subjective tests resulted in higher scores than for the IS-127 standard enhancement algorithm, when pre-processing noisy speech for a coding application.
Proceedings of the IEEE | 2000
Richard V. Cox; Candace A. Kamm; Lawrence R. Rabiner; Juergen Schroeter; Jay G. Wilpon
In the future, the world of telecommunications will be vastly different than it is today. The driving force will be the seamless integration of real time communications (e.g. voice, video, music, etc.) and data into a single network, with ubiquitous access to that network anywhere, anytime, and by a wide range of devices. The only currently available ubiquitous access device to the network is the telephone, and the only ubiquitous user access technology mode is spoken voice commands and natural language dialogues with machines. In the future, new access devices and modes will augment speech in this role, but are unlikely to supplant the telephone and access by speech anytime soon. Speech technologies have progressed to the point where they are now viable for a broad range of communications services, including: compression of speech for use over wired and wireless networks; speech synthesis, recognition, and understanding for dialogue access to information, people, and messaging; and speaker verification for secure access to information and services. The paper provides brief overviews of these technologies, discusses some of the unique properties of wireless, plain old telephone service, and Internet protocol networks that make voice communication and control problematic, and describes the types of voice services available in the past and today, and those that we foresee becoming available over the next several years.
IEEE Transactions on Signal Processing | 1991
Richard V. Cox; Joachim Hagenauer; Nambirajan Seshadri; Carl-Erik W. Sundberg
The effects of digital transmission errors on a family of variable-rate embedded subband speech coders (SBC) are analyzed in detail. It is shown that there is a difference in error sensitivity of four orders of magnitude between the most and the least sensitive bits of the speech coder. As a result, a family of rate-compatible punctured convolutional codes with flexible unequal error protection capabilities have been matched to the speech coder. These codes are optimally decoded with the Viterbi algorithm. Among the results, analysis and informal listening tests show that with a 4-level unequal error protection scheme transmission of 12 kb/s speech is possible with very little degradation in quality over a 16 kb/s channel with an average bit error rate (BER) of 2*10/sup -2/ at a vehicle speed of 60 m.p.h. and with interleaving over two 16 ms speech frames. >
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1986
Richard V. Cox
In the past, quadrature mirror filters (QMFs) have been used to derive both uniformly and nonuniformly spaced filterbanks. A pair of QMFs divides a signal into two equal bands which can be decimated at 2:1 and subsequently combined to reconstruct the original signal. In order to derive filterbanks with more than two bands, QMFs are combined in a binary tree structure. Pseudoquadrature mirror filters are similar to QMFs but can be designed to split a signal directly into any number of equally spaced bands, thus generalizing the QMF concept. In this paper, the theory of pseudoquadrature mirror filters is reviewed. These filters retain the desirable property that the channel signals from a uniformly spaced bank of M filters can be decimated by M:1, then interpolated and reassembled to reproduce the original signal. An extension is made to the theory to allow a set of nonuniformly spaced filters to be derived from a uniformly spaced set and still retain all the desirable characteristics. Another extension to the theory is the derivation of a family of different sized filterbanks, all derived from the same original prototype. Potential applications for the new filterbanks include improvements in subband coding of speech and music, and analog scrambling of speech.
vehicular technology conference | 1994
Richard V. Cox; Carl-Erik W. Sundberg
Viterbi decoding algorithms for convolutional codes are being considered for a number of applications in cellular mobile radio systems. There are three classes of Viterbi decoders depending on the nature of the formatting of the data: continuous decoding with a finite path memory, blockwise decoding with a terminating tail (known to the decoder), and blockwise decoding without a known tail. The latter class is also known as decoding of tailbiting convolutional codes. In this case, a coded message begins and ends in the same state which is unknown to the receiver. The authors present a class of Viterbi algorithms for tailbiting convolutional codes. These algorithms are used in blockwise transmission to save the overhead of a known tail. They call the new algorithm the circular Viterbi algorithm (CVA). The basic ideas are: (1) continue conventional seamless continuous Viterbi decoding beyond the block boundary by recording and repeating the received block of (soft) symbols; (2) start the decoding process in all states; and (3) end the decoding process either adaptively or with a fixed length. Three robust adaptive stopping rules are constructed and evaluated. Simulation results and comparison to previously known algorithms as well as the optimum algorithm are presented. The amount of computation required for previously reported iterative algorithms tends to increase dramatically as the channel bit error rate (BER) increases. In one reported instance, computation increased by over 900% while decoded BER increased from 8/spl times/10/sup /spl minus/6/ to 8/spl times/10/sup /spl minus/3/. For the same example, the CVA increase in computation was 11.4% and the worst case decoded BER was 4/spl times/10/sup /spl minus/3/. The authors conclude that for noisy channels the CVA decodes in a much shorter time with better performance than previously published iterative algorithms. >
IEEE Communications Magazine | 1996
Richard V. Cox; P. Kroon
The International Telecommunications Union (ITU) has standardized three speech coders which are applicable to low-bit-rate multimedia communications. ITU Rec. G.729 8 kb/s CS-ACELP has a 15 ms algorithmic codec delay and provides network-quality speech. It was originally designed for wireless applications, but is applicable to multimedia communications as well. Annex A of Rec. G.729 is a reduced-complexity version of the CS-ACELP coder. It was designed explicitly for simultaneous voice and data applications that are prevalent in low-bit-rate multimedia communications. These two coders use the same bitstream format and can interoperate. The ITU Rec. G.723.1 6.3 and 5.3 kb/s speech coder for multimedia communications was designed originally for low-bit-rate videophones. Its frame size of 30 ms and one-way algorithmic codec delay of 37.5 ms allow for a further reduction in bit rate compared to the G.729 coder. In applications where low delay is important, the delay of G.723.1 may be too large. However, if the delay is acceptable, G.723.1 provides a lower-complexity alternative to G.729 at the expense of a slight degradation in quality. This article describes the attributes of speech coders such as bit rate, complexity, delay, and quality. Then it discusses the basic concepts of the three new ITU coders by comparing their specific attributes. The second part of this article describes the standardization process for each of these coders.
IEEE Transactions on Speech and Audio Processing | 2001
Hong Kook Kim; Richard V. Cox
We propose a feature extraction method for a speech recognizer that operates in digital communication networks. The feature parameters are basically extracted by converting the quantized spectral information of a speech coder into a cepstrum. We also include the voiced/unvoiced information obtained from the bitstream of the speech coder in the recognition feature set. We performed speaker-independent connected digit HMM recognition experiments under clean, background noise, and channel impairment conditions. From these results, we found that the speech recognition system employing the proposed bitstream-based front-end gives superior word and string accuracies over a recognizer constructed from decoded speech signals. Its performance is comparable to that of a wireline recognition system that uses the cepstrum as a feature set. Next, we extended the evaluation of the proposed bitstream-based front-end to large vocabulary speech recognition with a name database. The recognition results proved that the proposed bitstream-based front-end also gives a comparable performance to the conventional wireline front-end.
international conference on acoustics, speech, and signal processing | 1989
Bishnu S. Atal; Richard V. Cox; Peter Kroon
The authors present results on the comparative performance of nonuniform scalar quantizers using three different LPC (linear predictive coding) representations: the arcsine of reflection coefficients, the log area ratios, and the line spectral frequencies. On comparing the spectral distortion introduced by quantizers based on these representations, it was found that the average distortion was very similar for all three, with the arcsine showing fewer large spectral errors. In a parallel study, the performance of the above LPC representations and the autocorrelation coefficients for interpolating the spectrum between adjacent time frames was investigated and revealed only small differences between the different representations. Informal listening tests with a complete 8 kb/s code-excited linear predictive (CELP) coder, incorporating both quantization and interpolation, showed no significant differences between the various LPC representations, suggesting that the random codebook for the excitation is able to compensate for small spectral deviations.<<ETX>>
IEEE Communications Magazine | 1997
Richard V. Cox
Many new speech coding standards have been created in the 10-year period 1987-1996. The author reviews the key attributes that determine what coder to select for different applications. The article then focuses on three new speech coding recommendations from the ITU-T, namely G.723.1, G.729, and Annex A of G.729. They provide good coverage for a wide range of applications that have low bit rate requirements (i.e., from 5.3 to 8 kb/s). In addition to bit rate, the article reviews their delay, complexity, and performance. Also reviewed are the history of these standards, and what considerations influenced the requirements each of these coders had to meet.