MacOs: Transcribe Training Video using OpenAI Whisper

: Category: Apple; 23 June 2025; Hits: 25758

What?
Just noted this down in case I need to do it again. This article documents how to take a 5.5 hour video (MP4) and have it transcribed by a locally installed AI on a MacOS (running Sequoia 15.1.1).

Why?
Needed for training videos as a follow-on from my article on how to download videos from a session on TrainerCentralSite.com.

How?
OpenAI's ChatGPT doesn't like more than a 500Mb file attachment; and no more than 10 files at a time. I have a 1.5 Gb video which has a duration of 5.5 hours. I struggled with this and ultimately used VOSK but I'm leaving the instructions here in case it works for you.

Pre-amble
Requires Homebriew, FFMPEG, PIP, and Python:

Open the "Terminal" app
Type brew install ffmpeg
Extract audio: ffmpeg -i your_video.mp4 -vn -ar 16000 -ac 1 -b:a 64k output_audio.mp3
[Optional if using online version of OpenAI Whisper] Split audio into segments: ffmpeg -i output_audio.mp3 -f segment -segment_time 600 -c copy output_chunk_%03d.mp3

Install Python and PIP

Still within the terminal: brew install python
Upgrade to latest PIP: pip3 install --user --upgrade pip

Install Numpy

As user (safest): pip3 install --user numpy
Check numpy version python3 -c "import numpy; print(numpy.__version__)"

Use OpenAI Whisper

Install Whisper using PIP: pip3 install -U openai-whisper
Determine PATH of Whisper: find ~/.local/bin /usr/local/bin ~/Library/Python/*/bin -name whisper 2>/dev/null
Add to PATH
1. Edit your zsh file: nano ~/.zshrc
2. Append the PATH line: export PATH="$PATH:/Users/yourname/Library/Python/3.9/bin"
3. Save and Exit
  1. Press Control + O (write the file)
  2. Press Enter (confirm filename)
  3. Press Control + X (exit nano)
4. Apply It source ~/.zshrc
5. Test It whisper --help
Run the whisper on the original MP3: whisper output_audio.mp3 --model small

Error(s):

PATH breaks, no commands work: echo 'export PATH="/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/<your_username>/Library/Python/3.9/bin"' > ~/.zshrc
RuntimeError: Numpy is not available:
1. Which Python Whisper Uses head -n 1 $(which whisper)
2. Note the result for the next command, eg. #!/Library/Developer/CommandLineTools/usr/bin/python3
3. Install NumPy for That Exact Python /Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --user numpy

Alternative: Using VOSK RuntimeError: Numpy is not available: Use VOSK instead:

Install Vosk + Dependencies
copyrawstyler
```
pip3 install vosk
pip3 install soundfile

# to apply punctuation and capitalized letters within transcript output
pip install deepmultilingualpunctuation
```
1. pip3 install vosk
2. pip3 install soundfile
4. # to apply punctuation and capitalized letters within transcript output
5. pip install deepmultilingualpunctuation
Download an English Speech Model:
copyrawstyler
```
mkdir -p ~/vosk-models
cd ~/vosk-models
curl -O https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip
```
1. mkdir -p ~/vosk-models
2. cd ~/vosk-models
3. curl -O https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
4. unzip vosk-model-small-en-us-0.15.zip
Convert Your MP4 to MP3 (extract the audio from the video)
copyrawstyler
```
ffmpeg -i input_video.mp4 -q:a 0 -map a output_audio.mp3
```
1. ffmpeg -i input_video.mp4 -q:a 0 -map a output_audio.mp3
Convert Your MP3 to WAV (mono, 16KHz)
copyrawstyler
```
ffmpeg -i output-audio.mp3 -ar 16000 -ac 1 -f wav output-audio.wav
```
1. ffmpeg -i output-audio.mp3 -ar 16000 -ac 1 -f wav output-audio.wav

Create a Python Script to Transcribe

copy raw

from vosk import Model, KaldiRecognizer
import wave
import json

# Use the correct audio filename
wf = wave.open("output-audio.wav", "rb")
model = Model("/Users/<your_username>/vosk-models/vosk-model-small-en-us-0.15")
rec = KaldiRecognizer(model, wf.getframerate())

transcript = []

while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        result = json.loads(rec.Result())
        transcript.append(result.get("text", ""))

# Get final result
final = json.loads(rec.FinalResult())
transcript.append(final.get("text", ""))

# Save transcript
with open("transcript.txt", "w") as f:
    f.write(" ".join(transcript))

from vosk import Model, KaldiRecognizer
import wave
import json
# Use the correct audio filename
wf = wave.open("output-audio.wav", "rb")
model = Model("/Users/<your_username>/vosk-models/vosk-model-small-en-us-0.15")
rec = KaldiRecognizer(model, wf.getframerate())
transcript = []
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
result = json.loads(rec.Result())
transcript.append(result.get("text", ""))
# Get final result
final = json.loads(rec.FinalResult())
transcript.append(final.get("text", ""))
# Save transcript
with open("transcript.txt", "w") as f:
f.write(" ".join(transcript))

Run the Python Script to Transcribe
copyrawstyler
```
python3 transcribe_vosk.py
```
1. python3 transcribe_vosk.py
I left the script running overnight so I'm not sure whether it simply played it for 5.5 hours and transcribed it or whether it did it in a shorter amount of time; but the next morning I had a "transcript.txt" file in the working directory.

However, the outputted transcript file is all in lowercase without any punctuation so ChatGPT was not able to make much sense of it and reformat it.
[Score: 1/10 for effort] I gave the transcript.txt file (all lowercase no punctuation) to ChatGPT as an attachment (~100 pages) and gave it the following prompt:
copyrawstyler
```
rephrase the following so that it makes sense to a trainee:

"<first paragraph of transcription text file>"

// wait for completion and if paragraph is ok
Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
```
1. rephrase the following so that it makes sense to a trainee:
3. "<first paragraph of transcription text file>"
5. // wait for completion and if paragraph is ok
6. Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].

[Score: 7/10] I gave the transcript.txt file (all lowercase no punctuation) to Google's Gemini as an attachment (~100 pages) and gave it the following prompt:
copyrawstyler
```
Attached is a transcript from a training session on [Topic] - Day 3. Can you rewrite this using headings and punctuation so that it makes sense to other trainees reading it? Rephrase where necessary and include details from the official documentation for [Topic] where applicable.

// wait for completion and copy&paste into a MS Word document.
```
1. Attached is a transcript from a training session on [Topic] - Day 3. Can you rewrite this using headings and punctuation so that it makes sense to other trainees reading it? Rephrase where necessary and include details from the official documentation for [Topic] where applicable.
3. // wait for completion and copy&paste into a MS Word document.

Category: Apple :: Article: 906

MacOs: Transcribe Training Video using OpenAI Whisper

Add comment

Credit where Credit is Due:

Latest Articles

Accreditation

Donate & Support

Joes Word Cloud