• mrginger@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    10 months ago

    Any one of us who actually codes/scripts knows ChatGPT spits out hot garbage when asked to produce anything beyond maybe a single short one or two line code snippet or bash/powershell command. Like the article said the AI lacks context of what you’re trying to do. It will confidently spit out either completely wrong or made up code with commands that don’t even exist.

    Also, this will go really fucking well. Don’t give them any ideas.

    Kabir said, "From our findings and observation from this research, we would suggest that Stack Overflow may want to incorporate effective methods to detect toxicity and negative sentiments in comments and answers in order to improve sentiment and politeness.

    • lloram239@feddit.de
      link
      fedilink
      arrow-up
      4
      arrow-down
      1
      ·
      10 months ago

      Any one of us who actually codes/scripts knows ChatGPT spits out hot garbage

      Not my experience at all. What it spits out is almost always pretty damn close to the goal, for shell one liners it’s easily a better programmer than myself. Sometimes it might invent API calls that don’t exist, but so would any human that isn’t allowed to look up the documentation or compile the code for testing. I don’t think I have ever seen ChatGPT spit out anything remotely close to “hot garbage”. The situations where it fails the worst are the situations where there isn’t any good solution to begin with.

      It will confidently spit out either completely wrong or made up code with commands that don’t even exist.

      And it will often be able to correct them when you tell it what’s wrong or when you provide it with compiler error messages.

  • Black Xanthus@lemmy.world
    link
    fedilink
    arrow-up
    17
    ·
    10 months ago

    It’s interesting that the sharp fall in traffic mimics the fall of Twitter and Reddit.

    Anecdotally, I would find code answrs on Reddit or Twitter, that would direct to Stack to view the full answer, or a more complete explanation of why X should be done that way.

    Considering the (relatively) small decline, I’m surprised that Stack think the answer is ChatGPT(or similar), and not the loss of semantic details added by a Reddit/Twitter thread.

  • Immersive_Matthew@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    10 months ago

    I am using ChatGPT4+ with the code interpreter and I am finding it closer to 90% accurate writing 50-200 lines of c# code in Unity. Beyond 200 it starts to have more issues and the accuracy drops. It has saved me so much time refactoring my project.

    • SirGolan@lemmy.sdf.org
      link
      fedilink
      arrow-up
      7
      arrow-down
      2
      ·
      10 months ago

      Yeah. They buried it in there (and for some of their experiments just said “ChatGPT” which could mean either), but they used 3.5 and oddly enough, 3.5 gets 48% on HumanEval.

      • fristislurper@feddit.nl
        link
        fedilink
        arrow-up
        7
        arrow-down
        2
        ·
        edit-2
        10 months ago

        They “burried” it in the methodology section, where they describe how they generate prompts. This is the place I expect this to be mentioned, or am I missing something? Where else would they put it.

        • SirGolan@lemmy.sdf.org
          link
          fedilink
          arrow-up
          4
          arrow-down
          1
          ·
          10 months ago

          It’s a pretty important fact since there’s a huge difference between 3.5 and 4. Mentioning it once in one place is not great, plus they also just mention ChatGPT without specifying 3.5 or 4 earlier in that paragraph. The problem I have is this has led to press (and hence many other people) thinking ChatGPT is terrible at coding when in fact using the GPT 4 version, it’s actually pretty decent.

  • Kuvwert@lemm.ee
    link
    fedilink
    arrow-up
    12
    arrow-down
    5
    ·
    10 months ago

    52% In the first year is pretty cool, excited to see how it will evolve.

    • SirGolan@lemmy.sdf.org
      link
      fedilink
      arrow-up
      5
      arrow-down
      3
      ·
      10 months ago

      GPT4 with reflexion prompting gets 90% correct (for HumanEval coding benchmark). The paper this is based on is misleading at best.