The White House wants to ‘cryptographically verify’ videos of Joe Biden so viewers don’t mistake them for AI deepfakes::Biden’s AI advisor Ben Buchanan said a method of clearly verifying White House releases is “in the works.”

  • AbouBenAdhem@lemmy.world
    link
    fedilink
    English
    arrow-up
    23
    ·
    edit-2
    11 months ago

    Depending on the implementation, there are two cryptographic functions that might be used (perhaps in conjunction):

    • Cryptographic hash: An arbitrary amount of data (like a video file) is used to create a “hash”—a shorter, (effectively) unique text string. Anyone can run the file through the same function to see if it produces the same hash; if even a single bit of the file is changed, the hash will be completely different and you’ll know the data was altered.

    • Public key cryptography: A pair of keys are created, one of which can only encrypt data (but can’t decrypt its own output), and the other, “public” key can only decrypt data that was encrypted by the first key. Users (like the White House) can post their public key on their website; then if a subsequent message purporting to come from that user can be decrypted using their public key, it proves it came from them.

    • Serinus@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      11 months ago

      a shorter, (effectively) unique text string

      A note on this. There are other videos that will hash to the same value as a legitimate video. Finding one that is coherent is extraordinarily difficult. Maybe a state actor could do it?

      But for practical purposes, it’ll do the job. Hell, if a doctored video with the same hash comes out, the White House could just say no, we punished this one, and that alone would be remarkable.

      • AbouBenAdhem@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        ·
        edit-2
        11 months ago

        Finding one that is coherent is extraordinarily difficult.

        You’d need to find one that was not just coherent, but that looked convincing and differed in a way that was useful to you—and that likely wouldn’t be guaranteed, even theoretically.

        • Natanael@slrpnk.net
          link
          fedilink
          English
          arrow-up
          2
          ·
          11 months ago

          Pigeon hole principle says it does for any file substantially longer than the hash value length, but it’s going to be hard to find

        • ReveredOxygen@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          11 months ago

          Even for a 4096 bit hash (which isn’t used afaik, usually only 1024 bit is used (but this could be outdated)), you only need to change 4096 bits on average. Even for a still 1080p image, that’s 1920x1080 pixels. If you change the least significant bit of each color channel, you get 6,220,800 bits you can change within anyone noticing. That means on average there are 1,518 identical-looking variations of any image with a given 4096 bit hash, on average. This goes down a lot when you factor in compression: those least significant bits aren’t going to stay the same. But using a video brings it up by orders of magnitude: rather than one image, you can tweak colors in every frame The difficulty doesn’t come from the existence, it comes because you need to check 2⁵¹² = 10¹⁵⁴ different images to guarantee you’ll find a match. Hash functions are designed to take a while to compute, so you’d have to run a supercomputer for an extremely long time to brute force a hash collision

          • Natanael@slrpnk.net
            link
            fedilink
            English
            arrow-up
            1
            ·
            11 months ago

            Most hash functions are 256 bit (they’re symmetric functions, they don’t need more in most cases).

            There are arbitrary length functions (called XOF instead of hash) which built similarly (used when you need to generate longer random looking outputs).

            Other than that, yeah, math shows you don’t need to change more data in the file than the length of the hash function internal state or output length (whichever is less) to create a collision. The reason they’re still secure is because it’s still extremely difficult to reverse the function or bruteforce 2^256 possible inputs.

            • ReveredOxygen@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              11 months ago

              Yeah I was using a high length at first because even if you overestimate, that’s still a lot. I did 512 for the second because I don’t know a ton about cryptography but that’s the largest SHA output

      • CyberSeeker@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        11 months ago

        There are other videos that will hash to the same value

        This concept is known as ‘collision’ in cryptography. While technically true for weaker key sizes, there are entire fields of mathematics dedicated to probably ensuring collisions are cosmically unlikely. MD5 and SHA-1 have a small enough key space for collisions to be intentionally generated in a reasonable timeframe, which is why they have been deprecated for several years.

        To my knowledge, SHA-2 with sufficiently large key size (2048) is still okay within the scope of modern computing, but beyond that, you’ll want to use Dilithium or Kyber CRYSTALS for quantum resistance.

        • Natanael@slrpnk.net
          link
          fedilink
          English
          arrow-up
          3
          ·
          11 months ago

          SHA family and MD5 do not have keys. SHA1 and MD5 are insecure due to structural weaknesses in the algorithm.

          Also, 2048 bits apply to RSA asymmetric keypairs, but SHA1 is 160 bits with similarly sized internal state and SHA256 is as the name says 256 bits.

          ECC is a public key algorithm which can have 256 bit keys.

          Dilithium is indeed a post quantum digital signature algorithm, which would replace ECC and RSA. But you’d use it WITH a SHA256 hash (or SHA3).