YouTube Captions to Arc (updated 26 Oct. 2018)

ProfessorBeyrer
Community Coach
Community Coach
5
1625

Earlier this year I proposed the idea https://community.canvaslms.com/ideas/11306-arc-recognize-captioned-youtube-videos?sr=search&searchI...‌, but it has not happened yet. So I did some poking and searching and found a solution.

In the embedded video I demonstrate the following tasks:

  1. Use the website Youtube2Subtitle.com to generate a .srt file from a YouTube video's captions
  2. Upload the caption file into Arc

The video demo I made on my iMac, and I was able to download the .srt file directly from the website. It is also possible to create your own .srt file by copying the caption from the YouTube2Subtitle.com website into a plain-text editor (like Notepad on Windows) and saving that with a .srt extension.

Edited with a new process:

This week I looked at the site mentioned in this blog entry and found out it no longer functions. Ugh! Before I fell into too much despair I saw that it is possible to download a copy of the captions file from YouTube. The video must be set to allow community members to contribute captions. If the video already has English captions provided by the creator those cannot be downloaded, but I discovered that I could tell YouTube I was contributing captions in a friendly language, say Canadian English, and I was able to one-click copy the published captions to my new language. That could then be downloaded. (I say "friendly' because I tried to create captions in Klingon but that language does not have the one-click copy from English.) 

YouTube does not download captions in a file that is recognized by Arc. so I had to convert those captions (.sbv file) into an Arc-friendly format (.srt file). I found a site that does this (https://captionsconverter.com). We'll see how long this one lasts! 

It's all demonstrated on this video:

Tags (4)
5 Comments
James
Community Champion

Arc users met at InstructureCon in 2018 and were joined by some of the Arc team. They said this isn't going to happen [ever] because it is a violation of Google's terms of service. While users might be able to do this, Instructure can't. It's not because they don't want to or because they can't write the software to do it.

They were unaware of the hack to get the captions to load from YouTube. It's not an acceptable work-around and I don't want my students going to YouTube and watching directly -- I won't be able to tell what they've watched if they do that.

I started off using the built-in captioning system with my Arc videos, but then changed the process. My home-made videos were already on YouTube, but the captioning was terrible and I found that Arc's speech recognition seemed more accurate. Still, it had issues.

My workflow for captioning my videos is now this:

  1. Upload the videos into Arc and have Arc do the automatic speech recognition to create Captions.
  2. Download the SRT file from Arc. Run it through awk to strip out all of the timings and leave just the text. Then bring it into a word processor where I clean it up by fixing the punctuation, spelling, and line breaks. The biggest issue I have with Arc's captions is that they break lines at unnatural places. Fixing the lines to break at natural points of speech messes up the timings.
  3. At this point what I have is really a transcript, not a caption file. I take that transcript, go back to YouTube, and upload it to my original YouTube video. It does the transcript to caption conversion. While YouTube is terrible at recognizing speech and converting it to text, it does a pretty good job of matching text to speech.
  4. I download the SRT file from YouTube and upload it to Arc, replacing the one that they had.

Now I have a good copy of Captions in both YouTube and Arc.

This works for me because I'm the creator of the content. The Arc caption editor is okay and I liked it the more I used it, but I found I got better captions by doing the process above. It is more work, though.

I've also found that when using Adobe Premiere to make my videos, if I already have a transcript, I can play the video and set markers. Then I can export the markers, bring them into Excel, and reformat them to be an SRT file or WebVTT. This is what I'll probably be doing for new videos that aren't already on YouTube.

ProfessorBeyrer
Community Coach
Community Coach

Thanks for sharing that feedback from the Instructure team. I wonder how it's okay to embed a YouTube video in Arc but not the YouTube captions. And I wonder why it's okay for YouTube2Subtitle.com can get away with it (along with all the other sites that do something similar when I searched, uh, Google for how to download captions from a YouTube video) but Instructure feels they cannot. Maybe Google has somehow found some value in captions that elude a commonsense appraisal. 

ken_cooper
Community Participant

I was with James at Hack Night @ Instcon and was able to talk to Canvas folks about this, and it is indeed what they said.  It's so odd to me, since you can force captions to show in YouTube videos when you embed them by adding the "?cc_load_policy=1" in the embed code, but you can't do this within Arc.  I mean, if they would just let us manually embed the videos and customize the embed code in Arc it seems as though it would work, but at this point I don't really have faith that they are working towards this change on a large scale.

The instructions I give now are to have the student open the video in Arc, then click out to YouTube to access the video.  They then can close the YouTube window and go back into Arc, and the video will play with the captions (and the instructor can then view the analytics).  It's odd that it works, but that is where we are at with Arc.

Chris_Hofer
Community Coach
Community Coach

I tried this on YouTube videos that were 1) manually captioned and 2) auto-generated captions.  For the YouTube captions that were auto-generated, I was not able to download an associated SRT file.  Only videos that had manually entered captions created an associated SRT.

buttramc
Community Novice

This method still worked as of today. Thanks for taking the time to post this.