Google has released a machine learning tool that can generate pieces of music itself based on entered text. MusicLM is not yet available to everyone, but Google has put a research paper and samples online.
Google writes in a paper that it created MusicLM, a hierarchical sequence-to-sequence machine learning model. The tool can create music pieces with a clarity of 24kHz that last several minutes based on a text prompt. In addition to text, the prompt can also create music based on whistling or humming, or in response to a photo or a painting. Google gives an example of a painting by Salvador Dali, from which MusicLM composes its own song.
The tool itself cannot yet be used by everyone. Google does have on a separate website samples put online with the corresponding prompts. Those are descriptions like: ‘slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive’. MusicLM can also create multi-minute songs in what’s called a narrative mode where the prompt relays what’s happening at different times in the song.
Google has trained the tool on a data set of 280,000 hours of music. In addition to the tool, Google has also made a dataset called MusicCaps publicly available to researchers. That dataset consists of 5500 music descriptions including their original music. In the paper, Google writes that it is not releasing the tool because it wants to follow best practices in machine learning science and there is a chance that the tool will produce copyrighted material. That would happen in about one percent of cases.
Update 11.11: Initially, the piece stated that Google does not write about copyrighted material, but the company does. That has been adjusted.