Researchers find child abuse in AI dataset using Stable Diffusion
Stanford researchers say the Laion-5B dataset contains more than a thousand images of child abuse. Stability AI, among others, uses this dataset to train their generative artificial intelligence.
Laion-5B is a dataset with links to images scraped from social media and porn sites, among other things. Researchers at the Stanford Internet Observatory say that this dataset contains more than a thousand images known to be child abuse. The researchers verified this with American and Canadian organizations that work against child abuse, by checking hashes of the Laion-5B images with hashes from those anti-child abuse organizations
The controversial dataset consists of more than five billion images and has been used, among other things, as a dataset for Stable Diffusion. The researchers warn that generative AI projects using Laion-5B could potentially create realistic child abuse images. Laion, the German foundation behind the controversial dataset, says to Bloomberg does not tolerate such illegal content and says it will temporarily take the datasets offline to remove the controversial content. The organization also indicates that it has previously released filters that should be able to stop illegal content.
Stability AI says its Stable Diffusion model is based on “a filtered subset” of Laion-5B and that the model is tuned to “counter residual behavior.” Additional filters should prevent unsafe user prompts and unsafe outputs, according to Stability AI. With Stable Diffusion 1.5, which has fewer such filters, it would be easier to create sexually explicit content. The Stanford researchers therefore warn that Stable Diffusion 1.5 models should no longer be used.

