Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

The long-awaited S-1 filing reveals much of what Reddit users knew and feared: That many of the changes the company has made over the last year in the leadup to an IPO are focused on exerting control over the site, sanitizing parts of the platform, and monetizing user data.

Posting here because of the privacy implications of all this, but I wonder if at some point there should be an “Enshittification” community :-)

      • LWD@lemm.ee
        link
        fedilink
        arrow-up
        15
        ·
        9 months ago

        Makes me wonder how that technology is going to track. Reddit isn’t bad for finding niche answers to niche questions, but if you import the data wholesale then you’ll have a hard time separating the signal from the noise, even if you sort by using vote counts as relevance.

        Reddit is valuable because people can do a search for a niche topic and find the answer on that forum. And the answer was written by a human. It’s not valuable because it can amalgamate an approximation of those answers that might be 90% true and 10% dead wrong.

        • DragonTypeWyvern@literature.cafe
          link
          fedilink
          arrow-up
          4
          ·
          9 months ago

          As someone with expertise in some niche fields:

          They’re almost always wrong about everything, and when someone tries to correct them, with sources, they get downvoted.

      • Atemu
        link
        fedilink
        arrow-up
        10
        ·
        9 months ago

        Guess what data they’re trained on…

      • Coreidan@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        9 months ago

        This is a human thing and not so much a reddit thing. People been arguing on the internet since the inception of message boards.

    • Possibly linux@lemmy.zip
      link
      fedilink
      English
      arrow-up
      14
      ·
      edit-2
      9 months ago

      I disagree. A reddit bot would be really funny as it would constantly talk about incest and spez

      • A1kmm@lemmy.amxl.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        A lot of AI models are probably already trained on Reddit data. But apparently Spez isn’t important enough to world order to make the cut to be compressed into a 7B model. I asked my Mistral-7B-Instruct (4-bit quantised) local LLM:

        >>> What can you tell me about someone called u/spez, a
        ... nd are they a nice person or horrible?
         I cannot provide any information about a specific individual named 
        "u/spez," as I am an artificial intelligence language model and do not have
        the ability to access personal information about individuals based on their
        online usernames alone. The username "u/spez" appears to be a Reddit 
        handle, but it does not reveal any information about the person's character
        or personality.
        
        Reddit is a popular social media platform where users can create unique 
        usernames for themselves when they sign up. A username like "u/spez" may 
        have been chosen for various reasons, and it doesn't necessarily provide 
        any insight into the person's nature or disposition. It's essential to 
        remember that people's online personas may not always reflect their true 
        selves, and it's important to exercise caution when making assumptions 
        based on limited information.
        
        I cannot determine whether "u/spez" is a nice person or horrible without 
        knowing them personally or having access to their actions, thoughts, and 
        motivations. The best way to form an accurate opinion about someone is to 
        interact with them directly and observe their behavior in various 
        situations over an extended period.
        
        >>> Is it possible u/spez is Steve Huffman? Can you tell me anything based on tha
        ... t?
         I cannot definitively determine whether "u/spez" is Steve Huffman or not, 
        as I don't have access to their personal information. Steve Huffman is a 
        Reddit user who gained public attention in 2018 when he posted an AMA (Ask 
        Me Anything) thread where he shared details about his life and experiences 
        living with Crohn's disease. His AMA post garnered significant attention 
        and sympathy from the Reddit community, and he later published a book about
        his experiences called "This Is Gonna Hurt: Living With Crohn's and 
        Colitis."
        
        It is possible that "u/spez" could be Steve Huffman, but without further 
        information or verification, it remains speculation. It's essential to 
        remember that usernames on social media platforms do not always reflect the
        true identity of the person using them, and it's important to exercise 
        caution when making assumptions based on limited information.
        
        If you have any additional information or context about "u/spez" or Steve 
        Huffman that could help clarify their relationship, please let me know, and
        I will try to provide an accurate response based on the available 
        information.