• Sibbo@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    18
    ·
    1 year ago

    Now that’s interesting. I really have been waiting for something like this. Wonder if the LLM companies now actually have to explain where their models get the detailed information about the book from. Or if they can get away with stating that they have no idea how their own system works

    • Even_Adder@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      2
      ·
      1 year ago

      It is legal to create new knowledge about works or bodies of works. They don’t have a leg to stand on.

    • 133arc585
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      1 year ago

      I found this comment on Hacker News to be spot-on:

      The complaint lays out in steps why the plaintiffs believe the datasets have illicit origins — in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”

      This is the makers of AI explicitly saying that they did use copyrighted works from a book piracy website. If you downloaded a book from that website, you would be sued and found guilty of infringement. If you downloaded all of them, you would be liable for many billions of dollars in damages.