Boston Dynamics turned its robot dog into a talking tour guide using AI as seen in a somewhat unsettling video posted by Boston Dynamics. Boston Dynamics used OpenAI’s ChatGPT API, along with some open-source large language models (LLM) to carefully train its responses. It then outfitted the bot with a speaker, added text-to-speech capabilities, and made its mouth mimic speech “like the mouth of a puppet.”
The version speaking in a British accent and the one of a Shakespearean time traveller had me 😂 but it’s certainly a little unsettling overall.
As a minor aside to the story, I find it interesting that the devs tried to make the robotic dog ‘talk’ by opening and closing it’s ‘mouth’ and failed. Cinema such as Star Wars has proven that humans don’t need an android or robot to actually move its mouth when it speaks in order to have a conversation.
I also can’t help but wonder what has happened at Boston Dynamics - these robotic dogs were/are a wonder, but they aren’t new. They are scary with how agile they are, but it doesn’t seem like this is anything different than a thousand other ‘I integrated Chat GPT into my R/C car’ type things that have come out over the past year. Has innovation at Boston Dynamics stalled?
Boston Dynamics YouTube channel has been filled with silly videos. Often times they are duel function. 1. Build brand awareness through fun videos, and 2. Show the versatility of the onboard systems. In this case they are showing of the ability to navigate a real world human environment and the sensors/cameras that can be fed into other systems for advanced decision making and planning.
Also, the robots embracing the “personalities” was interesting (for someone like me who doesn’t have technical knowledge of LLMs etc) as well as entertaining.
They also uploaded this video a day after they showed off new functionality on their “Stretch” robot which more directly impacts livelihoods, as Stretch isn’t cute. Stretch is meant to replace menial labor.
Spot is cute and proactive. Stretch is what Boston Dynamics is actually selling.
How do you mean the robots embraced the personalities?
From what I understood, a short prompt regarding a personality was provided based on which the LLM generated the lines which were converted into speech conveyed to the listener through speakers. (If some technicalities are incorrect feel free to correct me). I used “embraced” kind of metaphorically. The robots themselves didn’t literally embrace a personality.
Then I agree. I guess that’s why I didn’t find this very interesting - you could strap speakers and ChatGPT to anything really, it has very little to do with the robot.
It’s just entertaining, at least on the video.
They are trying to move from “making an impressive video for the show” to “solving actual, real and usefull applications”. So this take obviously a long time to produce new results that show this.