Artificial intelligence researchers said Friday they have deleted more than 2,000 web links to suspected child sexual abuse imagery from a dataset used to train popular AI image-generator tools.

The LAION research dataset is a huge index of online images and captions that’s been a source for leading AI image-makers such as Stable Diffusion and Midjourney.

But a report last year by the Stanford Internet Observatory found it contained links to sexually explicit images of children, contributing to the ease with which some AI tools have been able to produce photorealistic deepfakes that depict children.

That December report led LAION, which stands for the nonprofit Large-scale Artificial Intelligence Open Network, to immediately remove its dataset. Eight months later, LAION said in a blog post that it worked with the Stanford University watchdog group and anti-abuse organizations in Canada and the United Kingdom to fix the problem and release a cleaned-up dataset for future AI research.

Stanford researcher David Thiel, author of the December report, commended LAION for significant improvements but said the next step is to withdraw from distribution the “tainted models” that are still able to produce child abuse imagery.

    • Iapar@feddit.org
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      2 months ago

      Mu.

      I wouldn’t use a amount of images I couldn’t check. I wouldn’t use images from unchecked sources. I wouldn’t make money from sexual exploited children.

      And I think people that don’t see the most obvious solution to that are fucked in the head.

      • istanbullu
        link
        fedilink
        arrow-up
        1
        ·
        2 months ago

        That won’t work. Models of this kind need billions of images or they are trash.