I’m using espeak (from F-Droid) for text to speech, and it’s working great. I’d like an app that does speech to text though, ideally supporting Swedish as well as English for Duolingo purposes, but even just English would be more than I have now.

  • rufus@discuss.tchncs.de
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    5 months ago

    I think that’s it. The two mentioned things in the previous comments are also what I’ve seen floating around. Sayboard and FUTO’s voiceinput. The former is free software and FUTO releases under a source-available license. Additionally you can use something like Kõnele (available in F-Droid) to connect to cloud-based services. Disregarding free software, there are probably a few others with a proprietary license. For example Google’s STT that is baked into their Android versions.

    • Pantherina@feddit.de
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      5 months ago

      Googles “speech services” works on GrapheneOS using sandboxed play services.

      Until they have added a Permission to restrict InterProcessCommunication (IPC) (Like possible on Linux with Flatpak) this might be too big of a privacy problem though.

      Also a lot of Google stuff is basically a proprietary cloud adapter.

      • rufus@discuss.tchncs.de
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        5 months ago

        Yeah, I don’t know if OP was looking for that. They specified ‘FOSS’ in the title. But I think Google can also do local STT nowadays, I haven’t tried it for quite a while. Sayboard and FUTO work remarkably well. I personally am struggling a bit more with the reverse part: TTS. There isn’t much except for espeak if you want other languages than English (and maybe Russian since there is another project that does a few other languages.) But I skipped on the Google services on my phone.

          • rufus@discuss.tchncs.de
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            5 months ago

            Sure. Seems the Thorsten voice is in every FLOSS text-to-speech project. I think he (the real Thorsten) also does YouTube videos about that topic.

            I don’t know of any other free software Android speech software (that also speaks German) except for espeak. And I need something that can talk to me in the car. For other purposes it sounds a bit rough in my opinion. Eventually I would like something more state of the art with a more human-like sound. And something that properly ties into my Linux desktop and brings local STT and TTS to every application. I think the components are there already. But we’re still missing the proper integration into both platforms. (And maybe a few more voices and training data for new ones in several languages.)

            • Pantherina@feddit.de
              link
              fedilink
              arrow-up
              2
              ·
              5 months ago

              Crazy thing is, this is not magic.

              There are dozens of companies doing that, even some for your own voice

              You just need all phoneems (dt. Phonem) so just read a specific text multiple times, and you can have your own custom voice.

              This just needs some funding, and the Mozilla Common Voice project should already have very sufficient data

              • rufus@discuss.tchncs.de
                link
                fedilink
                arrow-up
                2
                ·
                edit-2
                5 months ago

                I think we’re way past that. I’ve fiddled around a bit with ‘bark’ and another more common (open?) solution to do voice cloning. It takes like an 5 second audio clip of someone talking and it can extract features from that, train an AI model and transfer the ‘style’ of that voice to arbitrary speech. I don’t really know if it’s technically similar to the AI tools that can paint an astronaut on a horse and draw it in the style of van gogh… but it’s the same idea. And bark and other tools can also synthesize speech with an AI model. You can just give it text and instruct it to talk in a relaxed female voice, and it’ll do it. However, I wasn’t able to get good results out of it. It’s nice to play around with, but it’s not yet feasible for real world use. And it takes a proper graphics card (or a cloud service that provides you with GPU compute) to run it.

                I don’t think these tools use phonemes and the old-fashioned ways of doing it. It is machine learning and AI ‘magic’ that makes those tools sound more smooth and realistic.

                What I also like is coqui-ai. It seems to be entirely free and the samples sound on a complete next level compared to established tools like espeak-ng. Sadly it isn’t packaged in any of the Linux distributions I use. And I really don’t understand why. It also doesn’t need crazy system specs. But it doesn’t tie into the desktop at all and requires you to set up conda environments, handle the CUDA libraries and just running the ‘pip install TTS’ they listed on their github repo didn’t do it for me.

                (I excluded the commercial tools here. Big-Tech has some alright TTS. Google, Amazon, Apple, … they’re all usable. elevenlabs.io offer exceptionally good TTS, I think that’s what the AI narrated YouTube videos are made with. And I sometimes use the button to convert heise online articles to speech while doing the laundry or other stuff in the house that doesn’t take enough time for me to start a podcast. I just wish there was a button on my laptop that’d do the same thing with free software and offer similar quality.)

                [Edit: Forget what I said last. I’ve been distro-hopping lately and it seems coqui-ai/TTS is avalable in the Linux I’ve installed last week. I’m going to try it tomorrow.]

                • Pantherina@feddit.de
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  5 months ago

                  Packaging is a big thing. On Android the model needs to be integrated in a surrounding modern app using modern libraries.

                  I wouldnt be too hyped about training an AI with really little data, but if its substantial this is probably crazy cool.

  • ExtremeDullard@lemmy.sdf.org
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    5 months ago

    FUTO Voice Input. Hands down:

    • It works fantastically well
    • It works offline - in other words, Google doesn’t get to spy on what you say
    • It supports Swedish

    FUTO asks you to pay for the app but doesn’t force you to. You get the whole application regardless. Just for not treating users like crap and for releasing such nice apps, you really should pay them.

    If you do, make sure you download the APK from F-Droid or directly from them, so Google doesn’t get any of your money: the APK served by the Google Play Store uses the Play Store to collect payment, whereas the APK served by F-Droid and the direct download APK allows you to send FUTO money directly with Stripe (credit card).

      • ExtremeDullard@lemmy.sdf.org
        link
        fedilink
        arrow-up
        1
        ·
        5 months ago

        Interesting! Thank you for the link.

        To be honest, I am not a lawyer so those issues didn’t jump at me when I quickly read through the - very terse - license.

        Also, FUTO seems like decent people, and trustworthy off-line voice input software that users escape the Google surveillance is hard to come by. FUTO Voice Input is pretty much the only game in town, and the fact that it comes with source code is amazing to me. So I kind of overlooked the finer points to be honest, because it surprisingly ticks all the other boxes that matter to me.

    • N4CHEM
      link
      fedilink
      arrow-up
      3
      ·
      5 months ago

      RHvoice is text-to-speech (TTS), what OP is asking for is the opposite: speech-to-text (aka voice recognition).