TITLE: A trellis-searched 16 kbit/sec speech coder with low delay AUTHORS: M. W. Marcellin and T. R. Fischer CONFERENCE: IEEE Workshop on Speech Coding for Telecommunications, Vancouver, B.C., Canada, September 1989 ABSTRACT: Trellis coded quantization (TCQ) was recently introduced [1] as an effective scheme for encoding memoryless sources. TCQ is motivated by Ungerboeck's formulation of trellis coded modulation [2] and uses a structured codebook with an expanded set of quantization levels. Based on Ungerboeck's notion of set partitioning, the trellis structure then prunes the expanded number of quantization levels down to the desired encoding rate. By employing a deterministic codebook and Viterbi encoding [3], a computationally simple encoding structure is achieved. TCQ has previously been incorporated into a DPCM structure for encoding speech at 2 bits per sample (16 kbps) [4]. The signal-to-noise ratio (SNR) and segmental SNR (SEGSNR) performance of this system is quite good. Specifically, for a variety of speakers and test sentences, values of SNR were obtained between 17.52 dB and 21.66 dB, while the SEGSNR ranged from 19.13 dB to 21.90 dB. The reconstructed speech obtained from this system was judged, in informal listening tests, to be of excellent communications quality. In the work presented here, we extend the results reported in [4] by incorporating TCQ into a generalized predictive waveform coding structure with backward adaptive formant and pitch predictors. This structure is often referred to as a noise feedback coding structure [5]. Similar results for tree coding have been reported by Iyengar and Kabal [6]. The trellis encoding is accomplished by means of a modified Viterbi algorithm which allows flexibility in the choice of symbol release rule. The effects of varying the encoding delay and the number of symbols released per trace-back on system performance and complexity are investigated. In general, increasing the encoding delay increases the signal-to-noise ratio of the encoding, but this increase in performance saturates quickly. For encoding sampled speech at 16 kbps, nearly all the gain achievable by the trellis search can be obtained with encoding delays of 5 msec or less.