Convolutional Transformer for Neural Speech Coding

Kang, Hong-Goo and Kleijn, W. Bastiaan and Skoglund, Jan and Chinen, Michael

AES E-Library

Convolutional Transformer for Neural Speech Coding

In this paper, we propose a Convolutional-Transformer speech codec which utilizes stacks of convolutions and self-attention layers to remove redundant information at the downsampling and upsampling blocks of a U-Net-style encoder-decoder neural codec architecture. We design the Transformers to use channel and temporal attention with any number of attention stages and heads while maintaining causality. This allows us to take into consideration the characteristics of the input vectors and flexibly utilize temporal and channel-wise relationships at different scales when encoding the salient information that is present in speech. This enables our model to reduce the dimensionality of its latent embeddings and improve its quantization efficiency while maintaining quality. Experimental results demonstrate that our approach achieves significantly better performance than convolution-only baselines.

Author (s): Kang, Hong-Goo; Kleijn, W. Bastiaan; Skoglund, Jan; Chinen, Michael;
Affiliation: Google; Google; Google; Google (See document for exact affiliation information.)
AES Convention: 155 Paper Number:10668
Publication Date: 2023-10-06
Session subject: Signal Processing

DOI:

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Convention Paper

AES Conventions

AES Conferences

AES Training & Development

AES Inside Track

Journal of the AES

AES E-library

Special Publications

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

Richard C. Heyser Memorial Lecture Series

AES E-Library

Convolutional Transformer for Neural Speech Coding

Choose your country of residence from this list:

AES E-Library

Login Institutions

Convolutional Transformer for Neural Speech Coding

Choose your country of residence from this list: