• GBU_28@lemm.ee
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    10 months ago

    Huh? Image ai to semantic formating, then consumption is trivial now

    • Barbarian@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      edit-2
      10 months ago

      Could you give me an example that uses live feeds of video data, or feeds the output to another system? As far as I’m aware (I could be very wrong! Not an expert), the only things that come close to that are things like OCR systems and character recognition. Describing in machine-readable actionable terms what’s happening in an image isn’t a thing, as far as I know.

      • GBU_28@lemm.ee
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        10 months ago

        No live video no, that didn’t seem the topic

        But if you had the horsepower, I don’t think it’s impossible based on what I’ve worked with. It’s just about snipping and distributing the images, from a bottleneck standpoint

        • Barbarian@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          3
          ·
          edit-2
          10 months ago

          No live videos

          Well, that’d be a prerequisite to a transformer model making decisions for a ship scuttling robot, hence why I brought it up.

      • FooBarrington@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        10 months ago

        Describing in machine-readable actionable terms what’s happening in an image isn’t a thing, as far as I know.

        It is. That’s actually the basis of multimodal transformers - they have a shared embedding space for multiple modes of data (e.g. text and images). If you encode data and take those embeddings, you suddenly have a vector describing the contents of your input.