AES E-Library

Generating Music Reactive Videos by Applying Network Bending to Stable Diffusion

This paper presents the first steps toward the creation of a tool which enables artists to create music visualizations using pretrained, generative, machine learning models. First, the authors investigate the application of network bending, the process of applying transforms within the layers of a generative network, to image generation diffusion models by utilizing a range of point-wise, tensor-wise, and morphological operators. A number of visual effects that result from various operators, including some that are not easily recreated with standard image editing tools, are identified. The authors find that this process allows for continuous, fine-grain control of image generation, which can be helpful for creative applications. Next, music-reactive videos are generated using Stable Diffusion by passing audio features as parameters to network bending operators. Finally, the authors comment on certain transforms that radically shift the image and the possibilities of learning more about the latent space of Stable Diffusion based on these transforms. This paper is an extended version of the paper “Network Bending of Diffusion Models,” which appeared in the 27th International Conference on Digital Audio Effects.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
Publication Date:
Permalink: https://aes2.org/publications/elibrary-page/?id=22920


(732KB)


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
E-Libary location:
16938
Choose your country of residence from this list:










Skip to content