Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.
Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.
as much as the speech-to-text gets wrong on my phone, I can only imagine what it does with doctors’ notes.
one of my million previous jobs was in medical transcription, and it is so easy to misunderstand things even when you have a good grasp of specialty-specific terminology and basic anatomy.
they enunciate the shit they’re recording about your case about as well as they legibly write. you really have to get a feel for a doctor’s speaking style and common phrases to not turn in a bunch of errors.
Edit: oh yeah, ✨ innovation ✨
Edit 2: it gets better and better
Edit 3: wonder if the Organ Procurement Organizations are going to try to use this to blame for the extremely fucked up shit that’s been happening
I’ve been using Whisper with TankieTube and I’m curious whether these errors were made with the Large-v2 or the Large-v3 model. I suspect it was the latter, because its dataset includes output from the other.
Snake eating its own tail, etc.
In your experience, has whisper large c3 been much worse than vo2?
I haven’t done any comparing; I just went with the apparent consensus, which is that v2 was more accurate and hallucinated less.
In your experience, has whisper large c3 been much worse than vo2?
Who did they train it on, Trump, Biden, or any other of the geriatric ghouls in DC?
How can a transcription tool be so bad? YouTube doesn’t get things this wrong.
Probably audio quality. I can’t imagine the acoustics in a hospital room or the hallway outside are anything close to most YouTube videos being recorded with a professional mic
sometimes they go into a tiny little office so they can concentrate better, and it’s so much easier to hear those docs