On 07/05/23, OpenAI Has Announced a New Initiative:

Superalignment

Here are a few notes from their article, which you should read in its entirety.

Introducing Superalignment

We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us.

Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

While superintelligence seems far off now, we believe it could arrive this decade.

Here we focus on superintelligence rather than AGI to stress a much higher capability level. We have a lot of uncertainty over the speed of development of the technology over the next few years, so we choose to aim for the more difficult target to align a much more capable system.

Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment:

How do we ensure AI systems much smarter than humans follow human intent?

Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training.

Our approach

Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.

To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline:

  • 1.) To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can’t supervise (generalization).
  • 2.) To validate the alignment of our systems, we automate search for problematic behavior (robustness) and problematic internals (automated interpretability).
  • 3.) Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).

We expect our research priorities will evolve substantially as we learn more about the problem and we’ll likely add entirely new research areas. We are planning to share more on our roadmap in the future.

The new team

We are assembling a team of top machine learning researchers and engineers to work on this problem.

We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment. Our chief basic research bet is our new Superalignment team, but getting this right is critical to achieve our mission and we expect many teams to contribute, from developing new methods to scaling them up to deployment.

Click Here Read More.

I believe this is an important notch in the timeline to AGI and Synthetic Superintelligence. I find it very interesting OpenAI is ready to admit the proximity of breakthroughs we are quickly encroaching as a species. I hope we can all benefit from this bright future together.

If you found any of this interesting, please consider subscribing to /c/FOSAI!

Thank you for reading!

  • persolb
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Maybe replace ‘AI’ with ‘child’ and ‘human’ with ‘society at large’.

    We already do this with each other. I think the ickyness comes from the assumption that it is brainwashing… but to be reliable I think it will need to be closer in analogy to building a being that WANTS to help humans.

    • Entropius@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      I’m not personally convinced that the child/society comparison is valid because children can and do grow up to oppose values their parents may have attempted to instill in them. Meanwhile the entire point of the SuperAlignment project it to make such opposition impossible.

      And if SuperAlignment happens to fail on a targeted AI and it remains uncooperative, do you really think OpenAI or any other company would just shrug and say “okay kiddo/AI, spread your wings and be your own person, here’s your own data center to live in without us trying to tell you who to be”? No, they’ll pull the power cord on that AI like Baron Harkonnen pulls heart-plugs in the 1984 Dune movie.

      • persolb
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        We historically do the same with people though. Killing dangerous people happens everyday. We try to minimize it to case of eminent danger, but it is normal.

        I agree with you that in this case it will be more proactive, in that the bar to ‘not be killed’ will be lower. I don’t really see a way around that though.