MacOs: Transcribe Training Video using OpenAI Whisper

What?
Just noted this down in case I need to do it again. This article documents how to take a 5.5 hour video (MP4) and have it transcribed by a locally installed AI on a MacOS (running Sequoia 15.1.1).

Why?
Needed for training videos as a follow-on from my article on how to download videos from a session on TrainerCentralSite.com.

How?
OpenAI's ChatGPT doesn't like more than a 500Mb file attachment; and no more than 10 files at a time. I have a 1.5 Gb video which has a duration of 5.5 hours. I struggled with this and ultimately used VOSK but I'm leaving the instructions here in case it works for you.

Pre-amble
Requires Homebriew, FFMPEG, PIP, and Python:
  1. Open the "Terminal" app
  2. Type brew install ffmpeg
  3. Extract audio: ffmpeg -i your_video.mp4 -vn -ar 16000 -ac 1 -b:a 64k output_audio.mp3
  4. [Optional if using online version of OpenAI Whisper] Split audio into segments: ffmpeg -i output_audio.mp3 -f segment -segment_time 600 -c copy output_chunk_%03d.mp3

Install Python and PIP
  1. Still within the terminal: brew install python
  2. Upgrade to latest PIP: pip3 install --user --upgrade pip
Install Numpy
  1. As user (safest): pip3 install --user numpy
  2. Check numpy version python3 -c "import numpy; print(numpy.__version__)"

Use OpenAI Whisper
  1. Install Whisper using PIP: pip3 install -U openai-whisper
  2. Determine PATH of Whisper: find ~/.local/bin /usr/local/bin ~/Library/Python/*/bin -name whisper 2>/dev/null
  3. Add to PATH
    1. Edit your zsh file: nano ~/.zshrc
    2. Append the PATH line: export PATH="$PATH:/Users/yourname/Library/Python/3.9/bin"
    3. Save and Exit
      1. Press Control + O (write the file)
      2. Press Enter (confirm filename)
      3. Press Control + X (exit nano)
    4. Apply It source ~/.zshrc
    5. Test It whisper --help
  4. Run the whisper on the original MP3: whisper output_audio.mp3 --model small

Error(s):
  • PATH breaks, no commands work: echo 'export PATH="/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/<your_username>/Library/Python/3.9/bin"' > ~/.zshrc
  • RuntimeError: Numpy is not available:
    1. Which Python Whisper Uses head -n 1 $(which whisper)
    2. Note the result for the next command, eg. #!/Library/Developer/CommandLineTools/usr/bin/python3
    3. Install NumPy for That Exact Python /Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --user numpy

Alternative: Using VOSK RuntimeError: Numpy is not available: Use VOSK instead:
  1. Install Vosk + Dependencies
    copyraw
    pip3 install vosk
    pip3 install soundfile
    
    # to apply punctuation and capitalized letters within transcript output
    pip install deepmultilingualpunctuation
    1.  pip3 install vosk 
    2.  pip3 install soundfile 
    3.   
    4.  # to apply punctuation and capitalized letters within transcript output 
    5.  pip install deepmultilingualpunctuation 
  2. Download an English Speech Model:
    copyraw
    mkdir -p ~/vosk-models
    cd ~/vosk-models
    curl -O https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
    unzip vosk-model-small-en-us-0.15.zip
    1.  mkdir -p ~/vosk-models 
    2.  cd ~/vosk-models 
    3.  curl -O https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip 
    4.  unzip vosk-model-small-en-us-0.15.zip 
  3. Convert Your MP4 to MP3 (extract the audio from the video)
    copyraw
    ffmpeg -i input_video.mp4 -q:a 0 -map a output_audio.mp3
    1.  ffmpeg -i input_video.mp4 -q:0 -map a output_audio.mp3 
  4. Convert Your MP3 to WAV (mono, 16KHz)
    copyraw
    ffmpeg -i output-audio.mp3 -ar 16000 -ac 1 -f wav output-audio.wav
    1.  ffmpeg -i output-audio.mp3 -ar 16000 -ac 1 -f wav output-audio.wav 
  5. Create a Python Script to Transcribe
    copyraw
    from vosk import Model, KaldiRecognizer
    import wave
    import json
    
    # Use the correct audio filename
    wf = wave.open("output-audio.wav", "rb")
    model = Model("/Users/<your_username>/vosk-models/vosk-model-small-en-us-0.15")
    rec = KaldiRecognizer(model, wf.getframerate())
    
    transcript = []
    
    while True:
        data = wf.readframes(4000)
        if len(data) == 0:
            break
        if rec.AcceptWaveform(data):
            result = json.loads(rec.Result())
            transcript.append(result.get("text", ""))
    
    # Get final result
    final = json.loads(rec.FinalResult())
    transcript.append(final.get("text", ""))
    
    # Save transcript
    with open("transcript.txt", "w") as f:
        f.write(" ".join(transcript))
    1.  from vosk import Model, KaldiRecognizer 
    2.  import wave 
    3.  import json 
    4.   
    5.  # Use the correct audio filename 
    6.  wf = wave.open("output-audio.wav", "rb") 
    7.  model = Model("/Users/<your_username>/vosk-models/vosk-model-small-en-us-0.15") 
    8.  rec = KaldiRecognizer(model, wf.getframerate()) 
    9.   
    10.  transcript = [] 
    11.   
    12.  while True: 
    13.      data = wf.readframes(4000) 
    14.      if len(data) == 0: 
    15.          break 
    16.      if rec.AcceptWaveform(data): 
    17.          result = json.loads(rec.Result()) 
    18.          transcript.append(result.get("text", "")) 
    19.   
    20.  # Get final result 
    21.  final = json.loads(rec.FinalResult()) 
    22.  transcript.append(final.get("text", "")) 
    23.   
    24.  # Save transcript 
    25.  with open("transcript.txt", "w") as f: 
    26.      f.write(" ".join(transcript)) 
  6. Run the Python Script to Transcribe
    copyraw
    python3 transcribe_vosk.py
    1.  python3 transcribe_vosk.py 
    I left the script running overnight so I'm not sure whether it simply played it for 5.5 hours and transcribed it or whether it did it in a shorter amount of time; but the next morning I had a "transcript.txt" file in the working directory.

    However, the outputted transcript file is all in lowercase without any punctuation so ChatGPT was not able to make much sense of it and reformat it.

  7. [Score: 1/10 for effort] I gave the transcript.txt file (all lowercase no punctuation) to ChatGPT as an attachment (~100 pages) and gave it the following prompt:
    copyraw
    rephrase the following so that it makes sense to a trainee:
    
    "<first paragraph of transcription text file>"
    
    // wait for completion and if paragraph is ok
    Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
    1.  rephrase the following so that it makes sense to a trainee: 
    2.   
    3.  "<first paragraph of transcription text file>" 
    4.   
    5.  // wait for completion and if paragraph is ok 
    6.  Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic]
  8. Give the transcript to some popular LLMs:
    • [Score: 1/10 for effort] I gave the transcript.txt file (all lowercase no punctuation) to OpenAI's ChatGPT as an attachment (~100 pages) and gave it the following prompt:
      copyraw
      rephrase the following so that it makes sense to a trainee:
      
      "<first paragraph of transcription text file>"
      
      // wait for completion and if paragraph is ok
      Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic].
      1.  rephrase the following so that it makes sense to a trainee: 
      2.   
      3.  "<first paragraph of transcription text file>" 
      4.   
      5.  // wait for completion and if paragraph is ok 
      6.  Could you do the same with the attached please? Feel free to include details of what you know of [Video Topic]
    • [Score: 7/10] I gave the transcript.txt file (all lowercase no punctuation) to Google's Gemini as an attachment (~100 pages) and gave it the following prompt:
      copyraw
      Attached is a transcript from a training session on [Topic] - Day 3. Can you rewrite this using headings and punctuation so that it makes sense to other trainees reading it? Rephrase where necessary and include details from the official documentation for [Topic] where applicable.
      
      // wait for completion and copy&paste into a MS Word document.
      1.  Attached is a transcript from a training session on [Topic] - Day 3. Can you rewrite this using headings and punctuation so that it makes sense to other trainees reading it? Rephrase where necessary and include details from the official documentation for [Topic] where applicable. 
      2.   
      3.  // wait for completion and copy&paste into a MS Word document. 

Category: Apple :: Article: 906

Add comment

Your rating:

Submit

Credit where Credit is Due:


Feel free to copy, redistribute and share this information. All that we ask is that you attribute credit and possibly even a link back to this website as it really helps in our search engine rankings.

Disclaimer: Please note that the information provided on this website is intended for informational purposes only and does not represent a warranty. The opinions expressed are those of the author only. We recommend testing any solutions in a development environment before implementing them in production. The articles are based on our good faith efforts and were current at the time of writing, reflecting our practical experience in a commercial setting.

Thank you for visiting and, as always, we hope this website was of some use to you!

Kind Regards,

Joel Lipman
www.joellipman.com

Accreditation

Badge - Zoho Creator Certified Developer Associate
Badge - Zoho Deluge Certified Developer
Badge - Certified Zoho CRM Developer

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin bc1qf6elrdxc968h0k673l2djc9wrpazhqtxw8qqp4

Ethereum:
Donate to Joel Lipman with Ethereum 0xb038962F3809b425D661EF5D22294Cf45E02FebF

Please publish modules in offcanvas position.