[ad_1]
This paper was accepted on the NeurIPS 2023 workshop on Diffusion Fashions.
We display how conditional technology from diffusion fashions can be utilized to deal with quite a lot of sensible duties within the manufacturing of music in 44.1kHz stereo audio with sampling-time steerage. The situations we contemplate embrace continuation, inpainting and regeneration of musical audio, the creation of easy transitions between two completely different music tracks, and the switch of desired stylistic traits to present audio clips. We obtain this by making use of steerage at sampling time in a easy framework that helps each reconstruction and classification losses, or any mixture of the 2. This strategy ensures that generated audio can match its surrounding context, or conform to a category distribution or latent illustration specified relative to any appropriate pre-trained classifier or embedding mannequin.
We present randomly chosen samples for various inventive functions in Desk 1, every conditioned on a given audio immediate. For every job and immediate we present samples from the completely different fashions described within the paper.
Activity varieties:
infill: exchange the center two seconds of the immediate
regeneration: regenerate the center two seconds of the immediate
continuation: generate a brand new continuation ranging from the primary 2.4s of the immediate
transitions: regenerate a crossfaded part between two tracks
steerage: generate a brand new clip conditioned on the PaSST classifier embedding of the immediate
Prompts are drawn from a check break up of the Free Music Archive dataset, revealed by Michaël Defferrard et al. below a Artistic Commons Attribution 4.0 Worldwide License (CC BY 4.0).
[ad_2]
Source link