Google especially is working on multimodal models that do both language and image, audio, etc understanding in the same model. Their latest work, PaLM-E, demonstrates that learning in one domain (eg images) can indirectly benefit the model’s performance in other domains (eg text) without additional training in the other domain.
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@lemmy.ml
Subscribe to see more stories about technology on your homepage
Google especially is working on multimodal models that do both language and image, audio, etc understanding in the same model. Their latest work, PaLM-E, demonstrates that learning in one domain (eg images) can indirectly benefit the model’s performance in other domains (eg text) without additional training in the other domain.