June 23, 2024

Whisper: Free Speech-to-Text Engine

Whisper: Free Speech-to-Text Engine

Friday, September 23, 2022, Ralph Hersel

Sometimes I get really excited. And also last night, when the message scrolled through my timeline via a free speech-to-text engine. We know the company open ai As a leader in practical applications in speech recognition and “artificial art”. Maybe you already have D I tried to turn your thoughts into paintings. It is much more known San Francisco Corporation to model their language GPT-3.

All of OpenAI’s previous products were impressive business services but they weren’t available to everyone and also required a great deal of experience to be used in everyday life. That changed last night. The company released its new language model, Whisper, under the MIT license. I tried it right away.

What does this thing do? It’s a neural network (don’t call it an AI) that translates speech (in the form of an audio file) into written text. Why do you need it? For example, to turn meetings into written minutes, or transcribe podcasts.

Whisper comes in five language models:

  • Small: 39MB
  • Base (base): 74MB
  • Small: 244MB
  • Normal (medium): 769 MB
  • Large: 1550 MB

Except for the large form, the four smaller forms only support English.

Until now, there have been some cloud services that people would rather not trust for this task for money or data. The good news is that Whisper is a) free software, b) very easy to use, and c) it gives convincing results. If you are interested in the details of the system, you can find it at source or at hot read. I am interested if it works.

See also  All 25 Venerable Soul Ashes are located on the Shadow of the Earth Tree map


So far, you can get services, forms, and instructions using PyTorchAnd the hugging face transformers And a lot of experience to build STT engine. But this was hardly possible for the half-way interested user. Now the tide has turned, and that’s a good thing.

First you can check whether ffmpeg installed on your system, which it usually does. Just type in the station ffmpeg In , you will see it, or you will receive the installation request if it is missing.

Then you create a subdirectory with any name, for example b. hiss. Now go to this directory in the terminal: whisper disc It manages this:

pip install git+https://github.com/openai/whisper.git   

If you are using the Python installer Point If you don’t have it, you can install it from your distribution software store. This is already. Nothing else needs to be installed. What you need now is an audio file in English. For my experience, I spent about 1 minute from the last episode of LateNightLinux podcast extractor. This is the file you should listen to to compare copies:

In the next step, this audio file is converted to text. To do this, you upload the audio file (or any other file in English) under the name download a song In the directory you created earlier and use this simple command:

whisper latenightlinux.mp3 --model medium 

It is self-explanatory: hiss will access the file download a song Implemented using the medium language model (769 MB). Now you must be patient. Depending on the performance of your computer, it will take about 15 minutes to generate the text. You can follow this in the terminal:

See also  Microsoft Research accelerates LLM processing with Splitwise

While hiss It does its job, you can continue transcribing in 30 seconds at the station. Finally, there is also a text file in the directory: Dubbed Turkish series. mp3.txt

But to distract ourselves from this, this is a very sad day. Let’s talk about our discoveries. Will, what is a navidrom? Previously in Discoveries I had learned how to take my audiobook out of the audio and convert it to mp3 on m4a or whatever you want, which is fine and fine but I don’t want to carry something like. I don’t know the value of seven or eight carts. It’s a very long part on my phone like my expensive phone storage. Cloud storage is very cheap and my data plan on my phone is very cheap so what I want to do is store it in the cloud and stream it to my phone like you would with Spotify or one of those things.

You can now put on headphones and compare the audio file with the written text. I think you’ll come to the same conclusion as me: It’s pretty good. Now think about how you can use Whisper for your purposes.

source: https://openai.com/blog/whisper/