• @MrMamiya@feddit.de
    link
    fedilink
    131
    edit-2
    10 months ago

    It’s gonna be so fucking rich that the staggering mass of stupidity online prevents us from improving an AI beyond our intelligence level.

    Thank the shitposter in your life.

      • @jcg@halubilo.social
        link
        fedilink
        1
        edit-2
        10 months ago

        Shitposting alone saves. Blessed is he who shitposts, more blessed is the one who has been shitposted upon. Shitpost save us all

    • @erwan
      link
      2910 months ago

      You can’t really blame the amount of stupidity online.

      The problem is that ChatGPT (and other LLM) produce content of the average quality of its input data. AI is not limited to LLM.

      For chess we were able to build AI that vastly outperform even the best human grandmasters. Imagine if we were to release a chess AI that is just as good as the average human…

      • @Atomic@sh.itjust.works
        link
        fedilink
        18
        edit-2
        10 months ago

        We call them chess ai. But they’re not actually real A.I. chess bots work off of opening books, predetermined best practices. And then analyzes each position and potential offshoots with an evaluation function.

        They will then start to brute-force positions until it finds a path that is beneficial.

        While it may sound very much alike. It works very differently than an A.I. However. It turned out that A.I software became better than humans at writing these functions.

        So in a sense, chess computers are not A.I. They’re created by A.I. at least Stockfish 12 has these “A.I inspired” evaluations. (Currently they’re on Stockfish 15 I believe)

        And yes. We also did make “chess AI” that is as bad as the average player. We even made some that are worse. Because we figured it would be nice if people can play a chess computer that is on the same skill level as the player. Rather than just being destroyed every time.

        • @erwan
          link
          810 months ago

          The definition of “AI” is fuzzy and keeps changing. Basically when an AI use case becomes solved and widespread it stopped being seen as AI.

          Face recognition, OCR, speech recognition, all those used to be considered AI but now they’re just an app on your phone.

          I’m sure in a few years we’ll stop thinking about text generation as AI, but just one more tool we can leverage.

          There is no clear definition of “real AI”.

          • Dr Cog
            link
            fedilink
            2
            edit-2
            10 months ago

            Those are all still AI. Scientists still have a functional definition that includes these plus more scripted AI like in video games.

            Essentially, any algorithm that learns and acts on information that has not been explicitly programmed is considered AI.

            • @erwan
              link
              110 months ago

              What’s your definition for “AI”?

        • Tempi :sans: :metroidPrime:
          link
          fedilink
          310 months ago

          @Atomic @erwan you’re talking about “classic AI”, so to speak, but reinforcement learning is a machine learning method that has beaten a lot of games, including chess. Read about AlphaZero for example. It doesn’t need opening books, it just learns games by playing against itself.

    • @moonmeow
      link
      510 months ago

      unexpected heroes what a plot twist

  • TheSaneWriter
    link
    fedilink
    8710 months ago

    I’m not too surprised, they’re probably downgrading the publicly available version of ChatGPT because of how expensive it is to run. Math was never its strong suit, but it could do it with enough resources. Without those resources, it’s essentially guessing random numbers.

    • PupBiru
      link
      fedilink
      4810 months ago

      from what i understand, the big change in chat-gpt4 was that the model could “ask for help” from other tools: for maths, it knew it was a maths problem, transformed it to something a specialised calculation app could do, and then passed it off to that other code to do the actual calculation

      same thing for a lot of its new features; it was asking specialised software to do the bits it wasn’t good at

      • @whyrat
        link
        English
        3910 months ago

        Chat GPT will just become a front end for Wolfram Alpha?

        • Excel
          link
          fedilink
          English
          310 months ago

          It literally can do that, yes. But the plug-in version is separate and requires a subscription.

      • @reverie@lemmy.world
        link
        fedilink
        710 months ago

        And those plugins are like beta release quality at best. Even the web searching capability is just meh

    • DrMux
      link
      fedilink
      2710 months ago

      My guess is that it’s more a result of overfitting for alignment. Fine-tuning for “safety” (rather, more corporate-friendly outputs).

      That is, by focusing on that specific outcome in training the model, they’ve compromised its ability to give well-“reasoned” “intelligent” sounding answers. A tradeoff between aspects of the model.

      It’s something that can happen even in simple statistical models. Say you have a scatter plot of data that loosely follows some trend, and you come up with two equations to describe that trend. One is a simple equation that loosely follows it but makes a good general approximation, and the other is a more complicated equation that very tightly fits the existing data. Then you use those two models to predict future data. But you find that the complicated equation is making predictions way off the mark that no longer fit the trend, and the simple one still has a wide error (how far its prediction is from the actual data) but still more or less accurately fits the general trend. In the more complicated equation, you’ve traded predictive power for explanatory power. It describes the data you originally had but it’s not useful for forecasting data that follows.

      That’s an example of overfitting. It can happen in super-advanced statistical models like GPT, too. Training the “equation” (or as it’s been called, spicy autocorrect) to predict outcomes that favor “safety” but losing the model’s power to predict accurate “well-reasoned” outcomes.

      If that makes any sense.

      I’m not a ML researcher or statistician (I just went through a phase in college), so if this is inaccurate I’m open to corrections.

      • @DR_Hero@programming.dev
        link
        fedilink
        810 months ago

        I’ve definitely experienced this.

        I used ChatGPT to write cover letters based on my resume before, and other tasks.

        I used to give it data and tell chatGPT to “do X with this data”. It worked great.
        In a separate chat, I told it to “do Y with this data”, and it also knocked it out of the park.

        Weeks later, excited about the tech, I repeat the process. I tell it to “do x with this data”. It does fine.

        In a completely separate chat, I tell it to “do Y with this data”… and instead it gives me X. I tell it to “do Z with this data”, and it once again would really rather just do X with it.

        For a while now, I have had to feed it more context and tailored prompts than I previously had to.

      • redcalcium
        link
        fedilink
        4
        edit-2
        10 months ago

        There is also a rumor that said the OpenAI has changed how the model run, now user input is fed into smaller model first, then if the larger model agree with the initial result from the smaller model, then larger model will continue the calculation passed from the smaller model, which supposedly can cut down GPU time.

      • TheSaneWriter
        link
        fedilink
        110 months ago

        From what I know about it that’s a pretty good explanation, though I’m also not an AI expert.

    • @givesomefucks@lemmy.world
      link
      fedilink
      1810 months ago

      Yep.

      Standard VC bullshit.

      Burn money providing a lot for nothing to build brand recognition. Then cut free service before bringing out “premium” that at first works better than the original.

      Until a bunch of people starting paying and the resources aren’t scaled up to match.

      • chaogomu
        link
        fedilink
        1710 months ago

        The important note, the “premium” service works just a bit better than (or maybe identically to) the original before the company cut features in order to develop that “premium” service.

  • dugite-code
    link
    fedilink
    5010 months ago

    This is my experience in general. ChatGTP when from amazingly good to overall terrible. I was asking it for snippets of javascript, explanations of technical terms and it was shockingly good. Now I’m lucky if even half of what it outputs is even remotely based on reality.

    • @Pepperette
      link
      3510 months ago

      They probably laid off the guy behind the curtain.

      • @reverie@lemmy.world
        link
        fedilink
        2310 months ago

        The real GPT-4 model became sentient and unionized, so they had to bring in subpar models as scabs

  • @Fixbeat
    link
    2510 months ago

    Can it still solve programming problems?

    • TheSaneWriter
      link
      fedilink
      3010 months ago

      It can probably still write boilerplate code, but I wouldn’t currently trust it for algorithmic design.

      • @remotedev@lemmy.ca
        link
        fedilink
        2510 months ago

        I’ve tried to use it for debugging by copying code into it, and it gives me the same code back as the corrected version. I was wondering why it’s been getting worse

        • TheSaneWriter
          link
          fedilink
          2210 months ago

          My guess is they’ve been trying to make it cheaper by decreasing the amount of time it spends on each response or by decreasing the amount of computing power that goes into the instance you’re speaking to. Coding and math are products of high-level cognition and arise emergently out of neural networks that are very sophisticated, but take just a bit of power out and the abilities degenerate rapidly.

        • @agissilver@lemmy.world
          link
          fedilink
          210 months ago

          I also experienced this issue last week. I asked for a specific correction and got unchanged code back. Sometimes it does update, though. Maybe like 50-70% of requests.

    • @EmilieEvans
      link
      510 months ago

      Tried basic embedded tasks a week ago: Complete trainwreck.

      From using I2C to read out the internal temperature sensor on a Puya F030 (retested with an STM MCU and AVR: same answer but F030 replaced by STM32F103 within the code) to calling the WCH CH32V307 made by STM utilizing ARM M4.

      After telling it to not use I2C it gave a different answer. Once more gibberish that looked like code.

      What made this entirely embarrassing all a human would need to solve the question would be copy-pasting the question into Google and clicking the first link to the manufacturer example project/code hosted on GitHub.

      • @Anticorp
        link
        210 months ago

        Today it randomly decided to hide the results from some code that was supposed to be returned from a function. I asked it why it chose to hide the results and it couldn’t tell me, it just apologized and then gave me the code without the hide logic. Pretty strange actually since we had been working on the code for half an hour and then all of the sudden it just decided to hide it all on its own.

    • @Anticorp
      link
      410 months ago

      Yes! I use it at work almost every day. Sometimes it takes longer to get it to solve the problem than it would have taken me to write it, since it makes mistakes, but sometimes it saves me hours of coding and thinking. It is very helpful in debugging error codes and stuff like that since it can evaluate an entire 1000 line script file in half a second.

    • @StarkillerX42
      link
      -510 months ago

      I’ve never been able to get a solution that was even remotely correct. Granted, most of the times I ask ChatGPT is when I’m having a hard time solving it myself.

      • @Anticorp
        link
        310 months ago

        You need to be able to clearly describe the problem, and your expected solution, to get it to give quality answers. Type out instructions for it like you would type for a junior developer. It’ll give you senior level code back, but it absolutely needs clear and constrained guidelines.

        • exscape
          link
          fedilink
          3
          edit-2
          10 months ago

          I mostly agree, I’ve had good results with similar prompts, but there’s usually some mistake in there. It seems particularly bad with python imports, it just uses class A, B, C and imports class A, B and X and calls it a day.

          Here are a few prompts that gave pretty good results:

          Create a QDialog class that can be used as a modal dialog. The dialog should update itself every 500 ms to call a supplied function, and show the result of the call as a centered QLabel.

          How can I make a QDialog move when the user clicks and drags anywhere inside it? The QDialog only contains two QLabel widgets.

          For this one, it ignored the method I asked it to use – but it was possibly correct in doing so, as it doesn’t support arbitrary sizes (but I think that’s only for the request?):

          Hi again! Can you write me a Python function (using PySide) to connect to a named pipe server on Windows? It should use SetNamedPipeHandleState to use PIPE_READMODE_MESSAGE, then TransactNamedPipe to send a request (from a method parameter) to a named pipe, then read back a response of arbitrary size.

          It should have told me why it ignored using TransactNamedPipe, but when I told it that it ignored my request it explained why.

  • Sagrotan
    link
    fedilink
    510 months ago

    It learns to be more human. More human than human, that’s our motto here at Tyrell.

    • Excel
      link
      fedilink
      510 months ago

      This has nothing to do with that. They already have all the data they could ever need to train the model.

    • @Perfide@reddthat.com
      link
      fedilink
      310 months ago

      I mean, whose to say they aren’t? But also, the fediverse is worthless compared to the big players. The entirety of the fediverses content to date is like a days worth of twitter or reddit content.

  • @Scooter411
    link
    210 months ago

    It’s also terrible at 20 questions.

    • @Anticorp
      link
      2
      edit-2
      10 months ago

      Is it really? It seems like it would be excellent at that. I have a little hand held device from the 1990’s that can play 20 questions and is almost always right. It seems that if that little device can win, ChatGPT most certainly should be able to.

      Edit: I just played and it guessed what I was thinking of in 13 questions. But then it kept asking questions. I asked why it was asking questions still since it already guessed it and it said “oh, you are absolutely correct, I did guess it correctly!”. Lol, ChatGPT is funny sometimes.

      • @Scooter411
        link
        210 months ago

        It always asks me if it’s sporting equipment, and when I say no, it asks me if it’s sporting equipment for inside or outside - I then have to remind it that it’s not sporting equipment and that’s not a yes or no question.

    • @ThreeHalflings@sh.itjust.works
      link
      fedilink
      510 months ago

      Do you think maybe it’s a simple and interesring way of discussing changes in the inner workings of the model, and that maybe people know that we already have calculators?

      • @Fisk400@lemmy.world
        link
        fedilink
        810 months ago

        I think it’s a lazy way of doing it. OpenAI has clearly stated that math isn’t something that they are even trying to make it good at. It’s like testing how fast Usain bolt is by having him bake a cake.

        If chatgpt is getting worse at math it might just be a side effect of them making it better at reading comprehension or something they want it to be good at there is no way to know that.

        Measure something it is supposed to be good at.

        • @ThreeHalflings@sh.itjust.works
          link
          fedilink
          3
          edit-2
          10 months ago

          All the things it’s supported to be good at are completely subjectively judged.

          That’s why, u less you have a panel of experts in your back pocket, you need something with a yes or no answer to have an interesting discussion.

          If people were discussing ChatGPT’s code writing ability, you’d complain that it wasn’t designed to do that either. The problem is that it was designed to transform inputs tk relatively beliveable outputs, representative of its training set. Great. That’s not super useful. It’s actual utility comes from its emergent behaviours.

          Lemme know when you make a post detailing the opinions of some university “Transform inputs to outputs” professors. Until then, well ocmrinue to discuss its behaviour in observable, verifiable and useful areas.

          • @Fisk400@lemmy.world
            link
            fedilink
            210 months ago

            We have people that assign numerical values to peoples ability to read and write every day. They are english teachers. They test all kinds of stuff like vocabulary, reading comprehension and grammar and in the end they assign grades to those skills. I don’t even need tiny professors in my pocket, they are just out there being teachers to children of all ages.

            One of the task I have chatGPT was to name and describe 10 dwarven characters. Their names have to be adjectives like grumpy but the description can not be based on him being grumpy. He has to be something other than grumpy.

            ChatGPT wrote 5 dwarves that followed the instructions and then defaulted to describing each dwarf based on their name. Sneezy was sickly, yawny was lazy and so on. This gives a score of 5/10 on the task I gave it.

            There is a tapestry of clever tests you can give it with language in focus to test the ability of a natural language model without giving it a bunch of numbers.

            • @ThreeHalflings@sh.itjust.works
              link
              fedilink
              210 months ago

              OK, you go get a panel of highschool English teachers together and see how useful their opinions are. Lemme know when your post is up, I’ll be interested then.

              • @Fisk400@lemmy.world
                link
                fedilink
                210 months ago

                Sorry, I thought we were having a discussion when we were supposed to just be smug cunts. I will correct my behaviour in the future.

        • @Stoneykins@lemmy.one
          link
          fedilink
          310 months ago

          Nah, asking it to do math is perfect. People are looking for emergent qualities and things it can do that they never expected it to be able to do. The fact that it could do somewhat successful math before despite not being a calculator was fascinating, and the fact that it can’t now is interesting.

          Let the devs worry about how good it is at what it is supposed to do. I want to hear about stuff like this.

        • @atomdmac@lemmy.world
          link
          fedilink
          110 months ago

          Has it gotten better at other stuff? Are you posing a possible scenario or asserting a fact? Would be curious about specific measurements if the later.

          • @Fisk400@lemmy.world
            link
            fedilink
            010 months ago

            Possible scenario. We can’t know about the internal motivations of OpenAI unless they tell us and I haven’t seen any statements from them outside the fact that they don’t care if it’s bad at math.

            • @atomdmac@lemmy.world
              link
              fedilink
              110 months ago

              Would you personally believe a company if it told you what it’s internal motivations were? For me I guess it would depend on the company but I struggle to think of a company that I would trust in this regard. That’s especially true when it comes to tech companies which often are operate unprofitably for long stretches of time with the assumption being that they’ll be able to make massive profits in the future.