• WetBeardHairs
    link
    fedilink
    arrow-up
    16
    arrow-down
    2
    ·
    11 months ago

    That is glossing over how they process the data and transmit it to the cloud. The assistant wake word for “Hey Google” invokes an audio stream to an off site audio processor in order to handle the query. So that is easy to identify via traffic because it is immediate and large.

    The advertising-wake words do not get processed that way. They are limited in scope and are handled by the low power hardware audio processor used for listening for the assistant wake word. The wake word processor is an FPGA or ASIC - specifically because it allows the integration of customizable words to listen for in an extremely low power raw form. When an advertising wake word is identified, it sends an interrupt to the CPU along with an enumerated value of which word was heard. The OS then stores that value and transmits a batch of them to a server at a later time. An entire day’s worth of advertising wake word data may be less than 1 kb in size and it is sent along with other information.

    Good luck finding that on wireshark.

    • Septimaeus@infosec.pub
      link
      fedilink
      arrow-up
      8
      ·
      edit-2
      11 months ago

      Hmm, that’s outside my wheelhouse. So you’re saying phone hardware is designed to listen for not just one but multiple predefined or reprogrammable bank of wake words? I hadn’t read about that yet but it sounds more feasible than the constant livestream idea.

      The echo had the capacity for multiple wake words IIRC, but I hadn’t heard of that for mobile devices. I’m curious how many of these key words can they fit?