Hey all,

Some might remember this from about 9 months ago. I’ve been running it with zero maintenance since then, but saw there were some new updates that could be leveraged.

What has changed?

  • Jellyfin is supported (in addition to Plex and Tautulli)
  • Moved away from whisper.cpp to stable-ts and faster-whisper (faster-whisper can support Nvidia GPUs)
  • Significant refactoring of the code to make it easier to read and for others to add ‘integrations’ or webhooks
  • Renamed the webhook from webhook to plex/tautulli/jellyfin
  • New environment variables for additional control

What is this?

This will transcribe your personal media on a Plex or Jellyfin server to create subtitles (.srt). It is currently reliant on webhooks from Jellyfin, Plex, or Tautulli. This uses stable-ts and faster-whisper which can use both Nvidia GPUs and CPUs.

How do I run it?

I recommend reading through the documentation at: McCloudS/subgen: Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, and Tautulli (github.com) , but quick and dirty, pull mccloud/subgen from Dockerhub, configure Tautulli/Plex/Jellyfin webhooks, and map your media volumes to match Plex/Jellyfin identically.

What can I do?

I’d love any feedback or PRs to update any of the code or the instructions. Also interested to hear if anyone can get GPU transcoding to work. I have a Tesla T4 in the mail to try it out soon.

  • Maribel-han@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    How do I know if it’s working/doing it’s thing? I installed it but seens to be doing nothing

  • TheBigC@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This looks very cool, I am interested. Do I install it on the Plex server itself, or a pc running a plex client?

  • Vogete@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I didn’t know this project existed and j genuinely was thinking making this tool. This is amazing, thank you! I’ll definitely try it out, especially since I have a hard time finding subtitles for a lot of shows with proper sync.

  • tablecontrol@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    holy crap!! i’m going to try this tonight.

    I was having some subtitle timing issues on Breaking Bad that was driving me nuts

  • Snuupy@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Wow, this is great! I’d be interested in doing some subtitles for some non-English shows I have, would you happen to know if translating into English subtitles is supported?

    Also, take a look at https://github.com/m-bain/whisperX - subsai uses this and it’s much faster than whisper.cpp

    • McCloud@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It should detect the foreign language and make english subtitles, but I haven’t personally tried it.

      I’m not using whisper.cpp anymore. I did some short comparisons between WhisperX and stable-ts and ultimately decided to go with stable-ts. Functionally, I’m sure they’re very similar.

  • nullx@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Like just last week I set up bazarr and was delighted to learn that it has a similar feature to this and it works great (with a GTX 1070)… I would have set your project up in lieu of bazarr, but I liked how bazarr searches other sources and does a lot of other stuff in regards to also fixing+syncing existing subtitles.

    Do you have any plans on anything similar to these bazarr features or maybe potentially even creating a provider for bazarr?

  • fefeh1@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I have a suggestion. I have installed it and it seems to be working, but I don’t know which file it is working on at the time. I look at the logs and I can see where it is determining the language and translating and transcribing, but I have no idea which movie/show it is processing.

    Thanks for the great app!

    • McCloud@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Unfortunately stable-ts and whisper don’t obviously output which files it is working on, so you’re dependent on trying to decipher it from the logs. I tried to add prints to show which files it has queued and started, but with threading, the std-out sometimes gets lost or buffered in strange ways.

  • Kaikidan@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    The app works perfectly, really nice idea! But I noticed something on my install, on the GitHub it mention that it will transcribe into English from other languages, but I tried Japanese and Portuguese files and they got transcribed at their native language

    portuguese > portuguese

    japanese > japanese

    english > english

    is that the expected behavior or should i add some argument on the docker compose to force translation into english?

    • McCloud@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      If I knew what the endpoints were, nothing would prohibit it. I can add it to my short list.

      • McCloud@alien.topOPB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I just tried, Emby won’t actually send out the webook on an action. I can use the test webhook, but it won’t trigger off media actions. Documentation half-implies that it’s a premiere options?

  • Adam_Meshnet@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Nice! Do you reckon with GPU you could potentially run it in real time? I’ve set up an endpoint with Whisper to transcribe videos one of my colleagues needed for work on my homelab server, which cumulatively must have saved everyone days worth of time by now.

    • McCloud@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I’m not sure yet. Faster-whisper has some benchmarks of the Largev2 model taking about 1 minute for 13 minutes of audio. Smaller models ought to be quicker. Unsure if the specs of the GPU will make much differenece.

  • PoundKitchen@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Suhweeet!!! English only or will it handle other languages and translation too, Spanish to English?