• Deebster
    link
    fedilink
    English
    1210 months ago

    Even ignoring privacy arguments, I think that voice control is a great use case for running services locally - lower latency due to not having up upload your sample and the option of having it learn your accent is very attractive.

    That said, voice control is irritatingly error-prone and seems to be slower than just reaching for the remote control. I agree that automatic stuff would be best, but some stuff you can’t have rules for.

    Something that would be interesting is a more eye- and gesture-based system: I’m thinking something like you look at the camera and slice across your throat for stop or squeeze fingers together to reduce volume. This is definitely one to run locally, for privacy and performance reasons.

    • @oDDmON@lemmy.world
      link
      fedilink
      English
      810 months ago

      Assistive technology has been focused on this for a while.

      My brother had severe cerebral palsy and for years (80s-90s) communicated via analog technology, a literal alpha/iconography communication board, which he could tap on with a head wand. By 2000 he had a digital voice, but still had to use a wand.

      Stephen Hawking demonstrated eye sensing technology almost as soon as it was invented and that’s been over a decade ago.

      In most cases, there is a definite aspect of “bespokeness” to implementing assistive consumer communication technology, but the barriers implementing the same for an able audience would appear much lower.

    • Tippon
      link
      fedilink
      English
      110 months ago

      But where do you put the camera? If you’re sitting in front of the TV, then near the TV makes sense. What if you’re sitting facing a different direction with a book though? What if your hands are full?

      A camera based system would be much more limited, and probably wouldn’t work in the dark.

      • Deebster
        link
        fedilink
        English
        110 months ago

        You’re assuming that we can’t have both. Why not have it as an complementary input?

        I think looking at a device and talking is better than saying hey $brandname before everything, but having both would be better still.