I know people here are very skeptical of AI in general, and there is definitely a lot of hype, but I think the progress in the last decade has been incredible.

Here are some quotes

“In my field of quantum physics, it gives significantly more detailed and coherent responses” than did the company’s last model, GPT-4o, says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany.

Strikingly, o1 has become the first large language model to beat PhD-level scholars on the hardest series of questions — the ‘diamond’ set — in a test called the Graduate-Level Google-Proof Q&A Benchmark (GPQA)1. OpenAI says that its scholars scored just under 70% on GPQA Diamond, and o1 scored 78% overall, with a particularly high score of 93% in physics

OpenAI also tested o1 on a qualifying exam for the International Mathematics Olympiad. Its previous best model, GPT-4o, correctly solved only 13% of the problems, whereas o1 scored 83%.

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

Catherine Brownstein, a geneticist at Boston Children’s Hospital in Massachusetts, says the hospital is currently testing several AI systems, including o1-preview, for applications such as connecting the dots between patient characteristics and genes for rare diseases. She says o1 “is more accurate and gives options I didn’t think were possible from a chatbot”.

  • hotcouchguy [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    49
    ·
    3 months ago

    Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

    Bro it was trained on your thesis

    • UlyssesT [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      29
      ·
      3 months ago

      This is dangerously close to “prompt: say ‘I love you senpai’” and then suddenly feeling as if the treat printer really does love the computer toucher.

      People will believe what they want to believe, and even a data scientist isn’t immune to that impulse, especially when their job encourages it.

      • Barx [none/use name]@hexbear.net
        link
        fedilink
        English
        arrow-up
        7
        ·
        3 months ago

        Physicist code tends to be pretty simple, particularly when it’s just implementing some closed form solution. It is also possible that a model focused on parsing the math in papers - like equations in his thesis - would just reproduce this in Python or whatever.

  • KobaCumTribute [she/her]@hexbear.net
    link
    fedilink
    English
    arrow-up
    44
    ·
    3 months ago

    All of their models have consistently done pretty good on any sort of standard test, and then performed horribly in real use. Which makes sense, because if they can train it specifically to make something that looks like the answers to that test it will probably be good at making the answers to that, but it’s still fundamentally just a language parser and predictor without knowledge or any sort of internal modeling.

    Their entire approach is just so fundamentally lazy and grifty, burning massive amounts of energy on what is fundamentally a dumbshit approach to building AI. It’s like trying to make a brain by just making the speech processing lobe bigger and bigger and expecting it’ll eventually get so good at talking that the things it says will be intrinsically right instead of only looking like text.

  • InevitableSwing [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    29
    ·
    3 months ago

    There is definitely a lot of hype.

    I’m not being sarcastic when I say I have yet to see a single real world example where the AI does extraordinarily well and lives up to the hype. It’s always the same.

    It’s brilliant!*

    *When it’s spoonfed in a non real world situation. Your results may vary. Void were prohibited.

    OpenAI also tested o1 on a qualifying exam for the International Mathematics Olympiad. Its previous best model, GPT-4o, correctly solved only 13% of the problems, whereas o1 scored 83%.

    Ah, I read an article on the Mathematics Olympiad. The NYT agrees!..

    Move Over, Mathematicians, Here Comes AlphaProof

    A.I. is getting good at math — and might soon make a worthy collaborator for humans.

    The problem - as always - is the US media is shit. Comments on that article by randos are better and far more informative than that PR-hype article pretending to be journalism.

    Major problem with this article: competition math problems use a standardized collection of solution techniques, it is known in advance that a solution exists, and that the solution can be obtained by a prepared competitor within a few hours.

    “Applying known solutions to problems of bounded complexity” is exactly what machines always do and doesn’t compete with the frontier in any discipline.

    -–

    Note in the caption of the figure that the problem had to be translated into a formalized statement in AlphaGeometry’s own language (presumably by people). This is often the hardest part of solving one of these problems.

    AI tech bros keep promising the moon and the stars. But then their AI doesn’t deliver so tech bros lie even more about everything to get more funding. But things don’t pan out again. And the churn continues. Tech bros promise the moon and the stars…

    • UlyssesT [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      13
      ·
      edit-2
      3 months ago

      The Rube Goldbergian machine that burns forests and dries up lakes needs just a few more Rube Goldbergian layers to do… what we already had, more or less, but quicker and sloppier with more errors and more burned forests and dried up lakes.

      I truly do believe that most of the loudest “AI” proselytizers are trying to convince everyone else, and perhaps themselves, that there’s more to this than what’s being presented, and just like in the cyberpunkerino treats, criticism, doubt, or even concern about the harm this technology has already done and will be doing on a larger scale is framed in a tiresome lazy “you are just Luddites afraid of the future” thought-terminating cliched way. soypoint-1 k-pain soypoint-2

      • batsforpeace [any, any]@hexbear.net
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 months ago

        Despite skepticism over whether nuclear fusion—which doesn’t emit greenhouse gases or carbon dioxide—will actually come to fruition in the next few years or decades, Gates said he remains optimistic. “Although their timeframes are further out, I think the role of fusion over time will be very, very critical,” he told The Verge.

        gangster-spongebob don’t worry climate folks, we will throw some dollars at nuclear fusion startups and they will make us beautiful clean energy for AI datacenters in just a few years, only a few more years of big fossil fuel use while we wait, promise

        Oracle currently has 162 data centers in operation and under construction globally, Ellison told analysts during a recent earnings call, adding that he expects the company to eventually have 1,000 to 2,000 of these facilities. The company’s largest data center is 800 megawatts and will contain “acres” of Nvidia (NVDA)’s graphics processing units (GPUs) to train A.I. models, he said.

        porky-happy I want football fields of gpus

        Ellison described a dinner with Elon Musk and Jensen Huang, the CEO of Nvidia, where the Oracle head and Musk were “begging” Jensen for more A.I. chips. “Please take our money. No, take more of it. You’re not taking enough, we need you to take more of it,” recalled Ellison, who said the strategy worked.

        NOOOOO give us more chips brooo

  • It works 100% of the time 70% of the time now! While this is interesting and chain-of-thought reasoning is a creative way to get better at logic, this is inefficient and expensive to the point where hiring a person is certainly cheaper. I belive the API is only available to those who have already spent 1k on OpenAI subscriptions.

  • hypercracker@hexbear.net
    link
    fedilink
    English
    arrow-up
    15
    ·
    3 months ago

    Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

    yeah I’m gonna doubt that, or he didn’t actually compile/run/test that code. like all LLMs it’s amazing until you interact with it a bit and see how incredibly limited it is.

  • fubarx
    link
    fedilink
    English
    arrow-up
    13
    ·
    3 months ago

    Tried it for python coding involving PDFs, OCR, and text substitution. Did worse than GPT-4o (which also failed).

    Gave up and told it so. At least, it was very apologetic.

    • sgtlion [any]@hexbear.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      3 months ago

      I feel like a broken record saying this. But AI frequently does solve coding problems for me that would’ve taken hours. It can’t solve everything, and can’t handle large amounts, but it can be genuinely useful.

      • JoeByeThen [he/him, they/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        8
        ·
        edit-2
        3 months ago

        Same, but it has to be presented well. If you want it to work for you like a Junior Coding Assistant you need to talk to it like such; outline what you need, refine the prompt for caveats, and provide unique information for specialized use cases. I find it especially helpful for one off programming in languages I’m not familiar with or getting me past the mental block of a blank page.

        Also, there’s a lot of stuff being thrown at LLMs that really shouldn’t be. It’s not the be all end all of AI tech.

      • Barx [none/use name]@hexbear.net
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 months ago

        In my experience the main risks in coding are poor communication about what the thing is supposed to do and why and then translating this into a clear specification that everyone understands and can push forward on. Rarely is it about chugging away at a problem, which is mostly about typing speed and familiarity with dev tooling.

        What kinds of things has it saved time on? It has only caused headaches for those around me. At best they get something that is 90% what they asked for but they then need to spend just as much time finding the 10%.

        The most praise I’ve seen is for writing a bunch of tests, but to me this is actually the main way you defend a specification, that most important step I mentioned above. It’s where you get to say, “this captures what this stupid thing is supposed to do and what the expected edge cases look like”. That’s where things should be most bespoke!

        • sgtlion [any]@hexbear.net
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 months ago

          Diagnosing networking issues, short bash/python scripts of any and all purposes, gdb debugging, finding and learning how to use appropriate libraries, are most of my use cases. It’s not a one-and-done either, I often have to ask it to explain, or fix a broken aspect, or Google the documentation and try again, etc.

    • UlyssesT [he/him]@hexbear.net
      link
      fedilink
      English
      arrow-up
      25
      ·
      edit-2
      3 months ago

      apart from just screeching

      Their emotional screeching.

      Your enlightened totally-non-emotional proselytizing.

      https://futurism.com/openai-employees-say-firms-chief-scientist-has-been-making-strange-spiritual-claims

      You voluntarily came in here, leading with your passive-aggressive “screeching” framing. Don’t whine about being greeted as a clown while you’re riding a Silicon Valley unicycle and juggling cherry picked very smart and very important quotes while internalizing all those tech startup hype pitches and corporate ad copy at the same time. clown

      • FrogPrincess
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        edit-2
        3 months ago

        we have consistently given other reasons. our criticisms of AI are salient and based both within the fundamental structure of the technology and the ways in which it’s employed.

        Super. Link to the best critique of AI on here?

        • RedWizard [he/him, comrade/them]@hexbear.net
          link
          fedilink
          English
          arrow-up
          11
          ·
          edit-2
          3 months ago

          Its energy consumption is absolutely unacceptable, it puts the Crypto market to utter shame regarding its ecological impact. I mean, Three Mile Island Site 1 is being recommissioned to service Microsoft Datacenters instead of the 800,000 homes it could service with its 835 megawatt output. This is being made possible thanks to taxpayer backed loans provided by the federal government. So American’s tax dollars are being funneled into a private energy company, to provide a private tech company 835 megawatts of power output, for a service they are attempting to make a profit from. Instead of being provided clean, reliable energy to their households.

          Power consumption is only one half of the ecological impact that AI brings to the table, too. The cooling requirement of AI text generation has been found to consume just over 1 bottle of water (519 milliliters) per 100 words, or the equivalent of a brief email. In areas where electricity costs are high, they consume an insane amount of water from the local supply. In one case, The Dalles, Google’s datacenters were using nearly a quarter of all the water available in the town. Some of these datacenters use cooling towers where external air travels across a wet media so the water evaporates. Which means that they do not recycle the water being used to cool, and it is consumed and removed from whatever water supply they are drawing from.

          These datacenters consume resources, but often do not bring economic advantages to the people living in the areas they are constructed. Instead, those people are subject to the sounds of their cooling systems (if being electrically cooled), a hit to their property value, strain on their local electric grid, and often are a massive consumer of local water (if being liquid cooled).

          Models need to be trained and that training happens in datacenters, which can at times take months to complete. The training is an expense the company pays just to get these systems off the ground. So before any productive benefits can be gained by these AI systems, you have to consume a massive number of resources just to train the models. Microsoft’s data center used 700,000 liters of water while training GPT-3 according to the Washington Post. Meta used 22 million liters of water training its LLaMA-3 open source AI model.

          And for what exactly? As others have pointed out in this thread, and others outside this community broadly, these models only wildly succeed when placed into a bounded test scenario. As commenters on this NYT article point out:

          Major problem with this article: competition math problems use a standardized collection of solution techniques, it is known in advance that a solution exists, and that the solution can be obtained by a prepared competitor within a few hours.

          “Applying known solutions to problems of bounded complexity” is exactly what machines always do and doesn’t compete with the frontier in any discipline.

          Note in the caption of the figure that the problem had to be translated into a formalized statement in AlphaGeometry’s own language (presumably by people). This is often the hardest part of solving one of these problems.

          These systems are only capable of performing within the bounds of existing content. They are incapable of producing anything new or unexplored. When one data scientist looked at the o1 model, he had this to say about the speed at which the o1 model constructed code that took him months to complete:

          Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

          He makes these remarks, with almost no self-awareness. The likelihood that this model was trained on his very own research is very high, and so naturally the system was able to provide him a solution. The data scientist labored for months creating a solution that, to be assumed, wasn’t a reality beforehand, and the o1 model simply internalized his solution. When asked to provide that solution, it did so. This isn’t an astonishing accomplishment, it’s a complicated, expensive, and damaging search engine that will hallucinate an answer when you’ve asked it to produce something that sits outside the bounds of its training.

          The vast majority of use cases for these systems by the public are not cutting-edge research. It’s writing the next 100 word email you don’t want to write, and sacrificing a bottle of water every time they do it. It’s replacing jobs being held by working people and replacing them with a system that is often exploitable, costly, and inefficient at the task of performing the job. These systems are a parlor trick at best, and a demon whose hunger for electric and water is insatiable at worst.

          • UlyssesT [he/him]@hexbear.net
            link
            fedilink
            English
            arrow-up
            8
            ·
            3 months ago

            You didn’t get a reply to your effortpost because the treat printer proselytizer already assumes very smartness and correctness by default and any challenge to that gets no reply.

            I hate that shit so much. It’s a plague across the techbro world.

              • UlyssesT [he/him]@hexbear.net
                link
                fedilink
                English
                arrow-up
                6
                ·
                3 months ago

                So many of those credulous assholes are what I call “inevitabilists.” If they can’t convince you that internet funny money or Google Glass or AR/VR headsets or insert-techbro-fad-here isn’t just imminently prominent but will absolutely be everywhere and everyone will sing its praises and you will look like a stupid dumdum for not getting in on the ground floor, they’ll reframe the promise as a threat.

            • FrogPrincess
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              3 months ago

              You didn’t get a reply to your effortpost because

              You said this less than 15 minutes after the good comment.

          • Hexboare [they/them]@hexbear.net
            link
            fedilink
            English
            arrow-up
            4
            ·
            3 months ago

            it puts the Crypto market to utter shame regarding its ecological impact

            Crypto is still worse on electricity usage - I haven’t seen actual stats for AI-only electricity usage but crypto uses 0.4 percent of the global electricity supply compared to ~1.5 percent usage from all data centres. I don’t think AI comprises a full third of total usage. The AI hype crowd have projections that it will increase significantly which would require much better “AI” and actual use cases to see the sort of growth to make it a substantial issue.

            Evaporative losses are vastly worse for agricultural with open irrigation. 22 million liters of water seems like a lot, but it’s only 0.022 gigalitres. Google used a total of ~25 gigalitres across all their data centres, while Arizona uses about 8,000 gigalitres a year.

            For the Dalles example, Google used ~1.3 gigalitres in a town with a population of 15-25,000 thousand people, so 25 percent for a massive data centre is not unreasonable.

            As you note, it’s junk so a waste of resources, but unless they manage to double the industry year on year, (doubt) it won’t be a huge issue.

          • FrogPrincess
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            2
            ·
            3 months ago

            See?

            This is so much more credible than going “I hate AI, AI is shit”

            Posters like UlyssesT making everyone look bad.

            • UlyssesT [he/him]@hexbear.net
              link
              fedilink
              English
              arrow-up
              8
              arrow-down
              1
              ·
              3 months ago

              You didn’t post anything meaningful in return, just concern trolling and whining about my lack of civility.

              You came here claiming the naysayers were “screeching” from the very start. You don’t deserve civility, and your cowardice demonstrates you had nothing to offer in response except smugposting and blind faith in your billionaire masters.

            • BelieveRevolt [he/him]@hexbear.net
              link
              fedilink
              English
              arrow-up
              6
              ·
              3 months ago

              Shut the fuck up. Nobody wanted to respond to you (except RedWizard, he must have the patience of a saint) because we’ve already done this topic to death, and leading with an ableist meme doesn’t exactly imply you’re acting in good faith.

            • RedWizard [he/him, comrade/them]@hexbear.net
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              1
              ·
              3 months ago

              Hey, you’re talkin’ about my man UlyssesT all wrong, it’s the wrong tone. You do it again, and I’ll have to pull out the PPB. Still nothing to say though, I see. Do you not have much of a defense against the idea that the slop slot machine everyone worships is destroying communities and the ecosystem at large? I’m not sure how you can look at the comment I left and have so little to say about these truths. Do you believe the ends justify the means in some way? What is it?

        • hypercracker@hexbear.net
          link
          fedilink
          English
          arrow-up
          7
          ·
          3 months ago

          What a stupid thing to say. We don’t sit around writing long posts DEBOONKING every techbro talking point about AI for reference every time someone comes through. Since you care enough to post this way I’m sure you’ve seen all the criticism of others online and still come away with

          smuglord

    • hypercracker@hexbear.net
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      3 months ago

      The thing I came up with about AI is that I have used it to help on only one pull request in the last few years and that was the only one that got reverted. It is incapable of basic metacognition: communicating how confident it is in what answer it comes up with.

      Plus I gave it a question about a fairly obscure topic and it spat out a blog post I wrote on that topic nearly verbatim. So there is, you know, the mass plagiarism aspect of corporations turning the creative output of all humans into a money-making machine only for themselves.