Voices from the Rift

March 28, 2024

In April 2020, a blog post by OpenAI noted: “We’re introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.” The world was going into lockdown over the COVID-19 pandemic, and the remarkable project did not get noticed widely beyond AI enthusiasts.

OpenAI has not pursued music generation any further since then; perhaps their legal department has cautioned them to leave some copyright battles for others to fight. But with all the money flowing into AI, it was only a matter of time for a tool very much like Jukebox to become available to everyone.

This is now the case with suno.ai, a startup focused entirely on music creation. You supply a text prompt, and it produces a full song, with or without vocals.

By default, Suno will use a Large Language Model to generate lyrics for your songs. In almost all cases, the resulting lyrics will be very generic cringe. Fortunately, you can supply your own instead.

The quality of the generations varies wildly. Songs may be glitchy, vocals may be out of order, and you might get singers with all kinds of accents, whether you specify that or not. You may end up generating hundreds of songs before you end up with one you like.

Still, the experience is surreal, as you summon emotionally expressive voices out of nowhere. You begin to feel a bit like a producer, or like the judge in a talent show that operates like an assembly line.

When you have a generation you like, you can tweak it by means of continuations. For example, you can continue a song that's a couple of minutes long from the 30 second mark. That means you're effectively overwriting the AI's initial attempt.

You can do this a few times per song before subsequent generations lose coherence. With that in mind, I've found it a helpful way to iterate.

It's easy to imagine some of the features that will likely come next: – re-generate arbitrary subsets of the song (“0:33 to 0:45”) – generate variations of a song – re-use a “singer” from one song to the next – quality improvement tools for removing noise, glitches, etc.

Needless to say, some folks will regard this technology as inherently vile, a torment nexus dreamed up by sociopathic tech bros, an act of cultural robbery at an unprecedented scale.

I don't agree with that point of view. Copyright itself is an instrument of oppression, control, and gatekeeping. Anti-AI sentiments directed at people who are using these tools for experimentation and play feels like a dangerous backslide compared with the 2000s, when remix culture was widely celebrated.

I do hope that in the coming years, rather than burning generative AI to the ground, we can use its augmentation of human labor to challenge capitalism itself.

As this post makes clear, I've experimented a bit with suno.ai. I've long enjoyed writing lyrics; being able to turn song ideas that go back to my teenage years into music I personally enjoy listening to has felt incredibly empowering. As of this writing, here are the songs I've pieced together:

“Remember”, a simple memento mori in EDM form
“Romans”, a punk song about silly rationalizations for inaction
“Ode to Dead Matter”, a slow pop adaptation of this poetic reflection on pansychism
“Promised Land”, a rock song about the hopes and fears of a technological singularity
“False Confections”, a dumb ska song about taking fortunes a bit too literally

The songs, and their lyrics, are in the public domain. I doubt they'll get more than a few hundred plays in total, and that's fine. It's a new way of making music, and a new way of listening to it.