AES E-Library

← Back to search

1D Convolutional Layers to Create Frequency-Based Spectral Features for Audio Networks

Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training networks on frequency features such as the Mel-Spectrogram or Chromagram have been proven more effective and convenient than training on time samples. In practical realizations, these features are created on a different processor and/or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with various combinations. In this paper, we provide a PyTorch framework for creating spectral features and time-frequency transformation using the built-in trainable conv1d() layer. This allows computing these on-the-fly as part of a larger network and enabling easier experimentation with various parameters. Our work extends the work in the literature developed for that end: First by adding more of these features; and also by allowing the possibility of either training from initialized kernels or training from random values and converging to the desired solution. The code is written as a template of classes and scripts that users may integrate into their own PyTorch classes for various applications.


Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Session subject:


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

E-Libary location:
Choose your country of residence from this list:

Skip to content