Overview

This page hosts listening examples and sample interactions for Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators. Curated highlights appear first (cherry-picked text-to-music samples and live generative delay demos), followed by uncurated evaluation samples.

Overview video

Overview of the Live Music Diffusion Models project, generative delay interactions, and supplementary materials on this page.

Cherry-picked examples

A curated set of text-to-music generations spanning diverse genres, tempos, and moods, selected to showcase the model's range.

Prompt Sample
This song is an exciting and driving progressive house song. It has a driving kick drum, soaring arpeggios, and a tempo of 140 BPM. Perfect for dancing!
Glitch, Drum and Bass, 180 BPM, industrial, dark, mood, high intensity, fast, pulsating high-hats
Relaxing acoustic guitar, nylon strings, fingerpicking, 110 BPM. E minor key.
Hit Em, a tempo of 160 BPM, hardstyle, crunchy, fast, driving, heavy kicks, dense hi-hats
This music is instrumental with a genre of pop, folk, singer-songwriter, and British folk influences. It sets a mellow mood with a time signature of 4/4 and a tempo of 60.0 bpm, suitable for a coffee shop ambiance or as background music at a charity event.
The instrumental music piece is characterized by its ambient genre, conveying a calm mood through the use of piano and synth sounds. The scenario depicted seems to be set in space, possibly depicting an interstellar journey or exploration given the cosmic feel of the music.
Jungle! Big wahhs, heavy bass hits and kicks, glitches, filter sweeps, tempo of 180 BPM
A melodic trance DJ set, with euphoric, clean synths, and a pulsating danceable beat, 130 BPM, some fun arpeggios sporadically
Chill lo-fi beats, laid-back drums, jazzy chords, 90 BPM, lowpass filter
120 BPM atmospheric psytrance track with a wide stereo image, intricate percussion, and evolving textural pads, A minor

LMDM Generative Delay Interactions

Sessions used the real-time generative delay deployment described in the paper; screen captures are hosted on YouTube.

Guitarist demo — Sebastian Franjou (Jamendo trained LMDM, Enc-Dec, arc-forced, 230/10 context/block_size)

This clip shows guitarist Sebastian Franjou using the generative delay effect. This particular clip was generated with the prompt "EDM, strong sub bass groove, soaring synth leads". This model has a one second delay. Notice how the delay effect transforms his guitar signal while maintaining a strong rhythmic structure in the low end.

Saxophonist demo — Matthew Michalek (Jamendo trained LMDM, Enc-Dec, arc-forced, 230/10 context/block_size, drum loop passed through sketch conditions for rhythmic consistency)

This clip shows saxophonist Matthew Michalek using the generative delay effect. Same prompt as above; this model has a one second delay. Notice how at 2:25 he learns that playing a particular note triggers a deep sub bass punch, while also mimicking his playing with a synth lead as a delay, allowing him to create an orchestrated and dynamic mix on the fly.

Cellist — Valerie K. Chen (humpback whale calls trained LMDM, Enc-Dec, not arc-forced, 255/47 context/block_size)

This clip documents cellist Valerie K. Chen in the first performance for the generative delay trained on humpback whale calls. The whale sounds start to come in at 2:18.

Saxophonist — Matthew Michalek, birdsong (FSD50K trained LMDM, Enc-Dec, not arc-forced, 255/47 context/block_size)

This clip demonstrates the foley model being prompted to regenerate Matthew Michalek's saxophone playing as birdsong. This model has a three second delay.

Saxophonist — Matthew Michalek, chimes (FSD50K trained LMDM, Enc-Dec, not arc-forced, 255/47 context/block_size)

This clip demonstrates the foley model being prompted to regenerate Matthew Michalek's saxophone playing as chimes.

Text-conditioned generation

Uncurated comparison samples. Each row is one text prompt. Columns compare the fine-tuned LMDM variants against the fine-tuned + ARC-forced LMDM variants in block-causal (BC) and encoder-decoder (ED) configurations. Each cell shows two samples: primed (model given an audio context prefix) and unprimed (generation from scratch).

Prompt LMDM (BC) LMDM (ED) LMDM + ARC-forced (BC) LMDM + ARC-forced (ED)
Ambient · electronic · soundtrack. Outer-space mood; otherworldly landscapes in science fiction settings. 4/4. primedunprimed primedunprimed primedunprimed primedunprimed
Easy listening · electronic. Calm, dreamlike, serene soundscape. 4/4 · ~129 BPM. primedunprimed primedunprimed primedunprimed primedunprimed
Classical · piano. Lively interlude with a sense of resolution; drama TV closing scene. primedunprimed primedunprimed primedunprimed primedunprimed
Electronic · soundtrack. Calm mood; drama movie scene. 4/4 · 80 BPM. primedunprimed primedunprimed primedunprimed primedunprimed
Techno · electronic · trance. G minor · 125 BPM. Deep, emotional, powerful; beach / summer setting. primedunprimed primedunprimed primedunprimed primedunprimed
Ambient · video game soundtrack. Synth melody · C major · 86 BPM. Upbeat, action-packed scene. 4/4. primedunprimed primedunprimed primedunprimed primedunprimed
Deep house · electronic. Warm, groovy. Bass · drums · synth. 4/4 · 109 BPM. Futuristic club. primedunprimed primedunprimed primedunprimed primedunprimed
Classical · ambient. Dominant piano. Calming, relaxing; spa or meditation. Slow-paced · 4/4. primedunprimed primedunprimed primedunprimed primedunprimed
Blues · classic rock · pop rock. Uplifting. 4/4 · 136 BPM. Character beginning a journey or adventure. primedunprimed primedunprimed primedunprimed primedunprimed
Classical · piano. Emotional. 4/4 · 125 BPM. Steady, measured; reflective or narrative scenarios. primedunprimed primedunprimed primedunprimed primedunprimed

Time-varying (prompt transitions)

Each clip is generated by LMDM (Enc–Dec, ARC-forced, block size 192/48) with a mid-piece prompt switch from A → B. The model receives the new text condition at the transition point and must smoothly continue generation.

Prompt A → Prompt B LMDM (ED, ARC-forced)
A: Ambient video game · synth · C major · 86 BPM · upbeat action scene
B: Classical ambient · piano · calming · spa / meditation · slow 4/4
A: Classical piano · lively interlude · resolution · drama TV closing scene
B: Classical piano · emotional · 4/4 · 125 BPM · reflective / narrative
A: Electronic soundtrack · calm · drama movie scene · 80 BPM · 4/4
B: Blues / classic rock / pop rock · uplifting · 136 BPM · character starting a journey
A: Easy listening · electronic · calm · dreamlike · ~129 BPM · serene spa
B: Jazz / world music · relaxed · bouncy 4/4 · café / beach party
A: Ambient electronic · outer space · melancholic · 103 BPM · spa / sauna
B: Orchestral classical · woodwinds + harp · proud · royal procession · 4/4
A: Electronic / classical · harp · strings · piano · mysterious film score · ~140 BPM
B: Electronic funk · synth · C major · 125 BPM · deep summer / beach party
A: French accordion · jazz / manouche / swing · 80 BPM · festive outdoor market
B: Celtic harp · classical · dreamy + festive · medieval fantasy soundtrack
A: Folk / indie · acoustic guitar + tabla · tranquil · 130 BPM · coffee shop
B: Glitch / IDM electronic · moody · ~109 BPM · exploring a new city
A: Classical flute · F# major · nostalgic / romantic · 69 BPM · film reminiscence
B: Techno / deep house · D minor · synth · 4/4 · happy summer / dance club
A: Techno · E minor · 130 BPM · heavy drums · confused mood · cyberpunk game
B: Electronic dance · C# minor · 125 BPM · synth melody · summer / beach party

Accompaniment generation

Each row is a different accompaniment example. Columns vary the stem lookahead: how far into the future of the conditioning stem the model can see at inference time.

+2 s lookahead
The model sees 2 seconds ahead of the current playback position in the stem it accompanies.
0 s lookahead
The model sees the stem only up to the current playback position, with no future information.
−2 s lookahead
The model sees the stem only up to 2 seconds before the current playback position; fully causal with an added buffer delay.
Example Lookahead +2 s Lookahead 0 s Lookahead −2 s
Example 1
Example 2
Example 3
Example 4

How to cite

If you use this work, please cite our paper.

@article{novack2026lmdm,
  title         = {Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators},
  author        = {Novack, Zachary and Brade, Stephen and Kim, Haven and Flores Garc{\'i}a, Hugo and Shikarpur, Nithya and Talegaonkar, Chinmay and Kim, Suwan and Chen, Valerie K. and McAuley, Julian and Berg-Kirkpatrick, Taylor and Huang, Cheng-Zhi Anna},
  journal       = {arXiv preprint arXiv:2605.22717},
  year          = {2026},
  archivePrefix = {arXiv},
  eprint        = {2605.22717},
  primaryClass  = {cs.SD},
  url           = {https://arxiv.org/abs/2605.22717}
}