Whisper: Free Speech-to-Text Engine

Friday, September 23, 2022, Ralph Hersel

Sometimes I get really excited. And also last night, when the message scrolled through my timeline via a free speech-to-text engine. We know the company open ai As a leader in practical applications in speech recognition and “artificial art”. Maybe you already have D I tried to turn your thoughts into paintings. It is much more known San Francisco Corporation to model their language GPT-3.

All of OpenAI’s previous products were impressive business services but they weren’t available to everyone and also required a great deal of experience to be used in everyday life. That changed last night. The company released its new language model, Whisper, under the MIT license. I tried it right away.

What does this thing do? It’s a neural network (don’t call it an AI) that translates speech (in the form of an audio file) into written text. Why do you need it? For example, to turn meetings into written minutes, or transcribe podcasts.

Whisper comes in five language models:

Small: 39MB
Base (base): 74MB
Small: 244MB
Normal (medium): 769 MB
Large: 1550 MB

Except for the large form, the four smaller forms only support English.

Until now, there have been some cloud services that people would rather not trust for this task for money or data. The good news is that Whisper is a) free software, b) very easy to use, and c) it gives convincing results. If you are interested in the details of the system, you can find it at source or at hot read. I am interested if it works.

Experience

So far, you can get services, forms, and instructions using PyTorchAnd the hugging face transformers And a lot of experience to build STT engine. But this was hardly possible for the half-way interested user. Now the tide has turned, and that’s a good thing.

First you can check whether ffmpeg installed on your system, which it usually does. Just type in the station ffmpeg In , you will see it, or you will receive the installation request if it is missing.

Then you create a subdirectory with any name, for example b. hiss. Now go to this directory in the terminal: whisper disc It manages this:

pip install git+https://github.com/openai/whisper.git

If you are using the Python installer Point If you don’t have it, you can install it from your distribution software store. This is already. Nothing else needs to be installed. What you need now is an audio file in English. For my experience, I spent about 1 minute from the last episode of LateNightLinux podcast extractor. This is the file you should listen to to compare copies:

In the next step, this audio file is converted to text. To do this, you upload the audio file (or any other file in English) under the name download a song In the directory you created earlier and use this simple command:

whisper latenightlinux.mp3 --model medium

It is self-explanatory: hiss will access the file download a song Implemented using the medium language model (769 MB). Now you must be patient. Depending on the performance of your computer, it will take about 15 minutes to generate the text. You can follow this in the terminal:

While hiss It does its job, you can continue transcribing in 30 seconds at the station. Finally, there is also a text file in the directory: Dubbed Turkish series. mp3.txt

But to distract ourselves from this, this is a very sad day. Let’s talk about our discoveries. Will, what is a navidrom? Previously in Discoveries I had learned how to take my audiobook out of the audio and convert it to mp3 on m4a or whatever you want, which is fine and fine but I don’t want to carry something like. I don’t know the value of seven or eight carts. It’s a very long part on my phone like my expensive phone storage. Cloud storage is very cheap and my data plan on my phone is very cheap so what I want to do is store it in the cloud and stream it to my phone like you would with Spotify or one of those things.

You can now put on headphones and compare the audio file with the written text. I think you’ll come to the same conclusion as me: It’s pretty good. Now think about how you can use Whisper for your purposes.

source: https://openai.com/blog/whisper/

Gilbert Cox

“Prone to fits of apathy. Zombie ninja. Entrepreneur. Organizer. Evil travel aficionado. Coffee practitioner. Beer lover.”

Whisper: Free Speech-to-Text Engine

Experience

Pokémon Go Hyperbonus Raid Day with Mega Lucario – Here’s What You Need to Know

Researcher warns of fire in space – “one of the most dangerous scenarios in space travel”

Gamescom 2024: Asus partners with Webedia

Duchess Meghan secretly returned to Great Britain

The Body Shop files for bankruptcy in the UK

Address to the Nation: Joe Biden Explains His Resignation and Future

Ecologists Celebrate New Xesap National Park in Laos | Science

Recent Posts

Experience

Leave a Reply Cancel reply

More Stories

Pokémon Go Hyperbonus Raid Day with Mega Lucario – Here’s What You Need to Know

Researcher warns of fire in space – “one of the most dangerous scenarios in space travel”

Gamescom 2024: Asus partners with Webedia

You may have missed

Duchess Meghan secretly returned to Great Britain

The Body Shop files for bankruptcy in the UK

Address to the Nation: Joe Biden Explains His Resignation and Future

Ecologists Celebrate New Xesap National Park in Laos | Science