Just noted this down in case I need to do it again. This article documents how to take a 5.5 hour video (MP4) and have it transcribed by a locally installed AI on a MacOS (running Sequoia 15.1.1).
Why?
Needed for training videos as a follow-on from my article on how to download videos from a session on TrainerCentralSite.com.
How?
OpenAI's ChatGPT doesn't like more than a 500Mb file attachment; and no more than 10 files at a time. I have a 1.5 Gb video which has a duration of 5.5 hours. I struggled with this and ultimately used VOSK but I'm leaving the instructions here in case it works for you.
Pre-amble
Requires Homebriew, FFMPEG, PIP, and Python:
- Open the "Terminal" app
- Type brew install ffmpeg
- Extract audio: ffmpeg -i your_video.mp4 -vn -ar 16000 -ac 1 -b:a 64k output_audio.mp3
- [Optional if using online version of OpenAI Whisper] Split audio into segments: ffmpeg -i output_audio.mp3 -f segment -segment_time 600 -c copy output_chunk_%03d.mp3
Install Python and PIP
- Still within the terminal: brew install python
- Upgrade to latest PIP: pip3 install --user --upgrade pip
- As user (safest): pip3 install --user numpy
- Check numpy version python3 -c "import numpy; print(numpy.__version__)"
Use OpenAI Whisper
- Install Whisper using PIP: pip3 install -U openai-whisper
- Determine PATH of Whisper: find ~/.local/bin /usr/local/bin ~/Library/Python/*/bin -name whisper 2>/dev/null
- Add to PATH
- Edit your zsh file: nano ~/.zshrc
- Append the PATH line: export PATH="$PATH:/Users/yourname/Library/Python/3.9/bin"
- Save and Exit
- Press Control + O (write the file)
- Press Enter (confirm filename)
- Press Control + X (exit nano)
- Apply It source ~/.zshrc
- Test It whisper --help
- Run the whisper on the original MP3: whisper output_audio.mp3 --model small
Error(s):
- PATH breaks, no commands work: echo 'export PATH="/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/<your_username>/Library/Python/3.9/bin"' > ~/.zshrc
- RuntimeError: Numpy is not available:
- Which Python Whisper Uses head -n 1 $(which whisper)
- Note the result for the next command, eg. #!/Library/Developer/CommandLineTools/usr/bin/python3
- Install NumPy for That Exact Python /Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --user numpy
Alternative: Using VOSK RuntimeError: Numpy is not available: Use VOSK instead:
- Install Vosk + Dependencies copyraw
pip3 install vosk pip3 install soundfile # to apply punctuation and capitalized letters within transcript output pip install deepmultilingualpunctuation
- pip3 install vosk
- pip3 install soundfile
- # to apply punctuation and capitalized letters within transcript output
- pip install deepmultilingualpunctuation
- Download an English Speech Model: copyraw
mkdir -p ~/vosk-models cd ~/vosk-models curl -O https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip unzip vosk-model-small-en-us-0.15.zip
- mkdir -p ~/vosk-models
- cd ~/vosk-models
- curl -O https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
- unzip vosk-model-small-en-us-0.15.zip
- Convert Your MP4 to MP3 (extract the audio from the video)
- Convert Your MP3 to WAV (mono, 16KHz)
- Create a Python Script to Transcribe
copyraw
from vosk import Model, KaldiRecognizer import wave import json # Use the correct audio filename wf = wave.open("output-audio.wav", "rb") model = Model("/Users/<your_username>/vosk-models/vosk-model-small-en-us-0.15") rec = KaldiRecognizer(model, wf.getframerate()) transcript = [] while True: data = wf.readframes(4000) if len(data) == 0: break if rec.AcceptWaveform(data): result = json.loads(rec.Result()) transcript.append(result.get("text", "")) # Get final result final = json.loads(rec.FinalResult()) transcript.append(final.get("text", "")) # Save transcript with open("transcript.txt", "w") as f: f.write(" ".join(transcript))
- from vosk import Model, KaldiRecognizer
- import wave
- import json
- # Use the correct audio filename
- wf = wave.open("output-audio.wav", "rb")
- model = Model("/Users/<your_username>/vosk-models/vosk-model-small-en-us-0.15")
- rec = KaldiRecognizer(model, wf.getframerate())
- transcript = []
- while True:
- data = wf.readframes(4000)
- if len(data) == 0:
- break
- if rec.AcceptWaveform(data):
- result = json.loads(rec.Result())
- transcript.append(result.get("text", ""))
- # Get final result
- final = json.loads(rec.FinalResult())
- transcript.append(final.get("text", ""))
- # Save transcript
- with open("transcript.txt", "w") as f:
- f.write(" ".join(transcript))
- Run the Python Script to Transcribe
I left the script running overnight so I'm not sure whether it simply played it for 5.5 hours and transcribed it or whether it did it in a shorter amount of time; but the next morning I had a "transcript.txt" file in the working directory.
However, the outputted transcript file is all in lowercase without any punctuation so ChatGPT was not able to make much sense of it and reformat it.
- [Score: 1/10 for effort] I gave the transcript.txt file (all lowercase no punctuation) to ChatGPT as an attachment (~100 pages) and gave it the following prompt:
copyraw
rephrase the following so that it makes sense to a trainee: "<first paragraph of transcription text file>" // wait for completion and if paragraph is ok Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
- rephrase the following so that it makes sense to a trainee:
- "<first paragraph of transcription text file>"
- // wait for completion and if paragraph is ok
- Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
Give the transcript to some popular LLMs: - [Score: 1/10 for effort] I gave the transcript.txt file (all lowercase no punctuation) to OpenAI's ChatGPT as an attachment (~100 pages) and gave it the following prompt:
copyraw
rephrase the following so that it makes sense to a trainee: "<first paragraph of transcription text file>" // wait for completion and if paragraph is ok Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
- rephrase the following so that it makes sense to a trainee:
- "<first paragraph of transcription text file>"
- // wait for completion and if paragraph is ok
- Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
- [Score: 7/10] I gave the transcript.txt file (all lowercase no punctuation) to Google's Gemini as an attachment (~100 pages) and gave it the following prompt:
copyraw
Attached is a transcript from a training session on [Topic] - Day 3. Can you rewrite this using headings and punctuation so that it makes sense to other trainees reading it? Rephrase where necessary and include details from the official documentation for [Topic] where applicable. // wait for completion and copy&paste into a MS Word document.
- Attached is a transcript from a training session on [Topic] - Day 3. Can you rewrite this using headings and punctuation so that it makes sense to other trainees reading it? Rephrase where necessary and include details from the official documentation for [Topic] where applicable.
- // wait for completion and copy&paste into a MS Word document.
Category: Apple :: Article: 906
Add comment