Showing results for 
Search instead for 
Did you mean: 

Interactive YouTube transcript

I have made some javascript code to support interactive YouTube transcripts. A working example can be seen here: Look here for instructions on how to make it work in Canvas or any other web page: at master · PfDK/ · GitHub 

If anyone has any ideas on how to improve the code or want to help improving it, that would be great!

18 Replies

That's really awesome - if you embed that in a module, etc. does it pick up the Canvas styling?  

Community Coach
Community Coach

That's really neat,  @erlend_thune  Nice Work!  Hey  @James ‌, have a look at this!  ( @Renee_Carney ‌ suggested I reach out to you.)

Navigator II

Nice use of the YouTube API to bring the text off the video and onto the page,  @erlend_thune .

I stumbled across one of your other contributions this week as I was trying to get WordPress and H5P to play nice with Canvas. I ended up breaking my xAPI installation before I got to try your script, but thank you for making these available to people.

One way around the duplicated video issue, if that really is an issue, would be to get the ID of the video from the src attribute on the iframe and make everything else generic without tacking on the specific video ID.

Were you envisioning that people use this outside of an iframe? If so, ignore most of what I wrote in the last paragraph.

What is the purpose of the 0:00 at the beginning. it's the only timestamp and it doesn't change for me.

Moving forward past the initial "wow this is really cool and a great idea" effect and it's great that this code is available and documented so that people can build off of it, I'm trying to wrap my brain around how it would be used. I said the same thing when Alexa for Canvas came out.

I can see the desire to extract a transcript automatically from YouTube. There has been a push to close caption videos and we finally have someone at our school other than me who is advocating that. The problem with YouTube transcription is that you lose all sense of the speaker and any paragraph breaks. In other words, you won't get nicely formatted output this way.

The text extracted from YouTube and displayed on the page ran together, which is where the highlighting came into play so that I could follow along and see where the text was.

I don't think this is technically possible through the YouTube API, but you lose the context of paragraphs and author when it is highlighting what is being said. If someone had a nicely formatted transcript that kept the formatting, it might be nice to highlight that text when the video is played. It would probably require the two transcripts to match really closely, though.

But now I'm back to how we will use this question? If I am sighted, but hearing impaired, then I would turn on the close captioning so that I could watch the video and not be distracted by the highlighting on the text distracting me and causing me to jump back and forth.

If I am visually impaired but can hear, then I am getting the audio from the video or the audio from the screen reader that is reading the transcript, but the screen reader isn't able to do anything other than read a single block of text since it isn't marked up in anyway with the speaker or paragraphs. Again, that's a limitation I think of where it came from.

I think I might have just answered my own question, but I tend to leave things I've written as a trail of my thought process. What I just noticed after writing all this is that blocks of text are highlighted when you mouse over something. I missed that the first time as I didn't mouse over the text, I just loaded the page and played the video. This allows you to jump to a specific portion of the video and I can see the use for that a little better. Videos tend to be too long and this would help a student jump straight to the relevant portion. So far, this is the best part of this for me.

Maybe you could add a section to the documentation the explains the features rather than just saying here's a demo? That way we know ahead of time what to expect and don't miss out on the biggest thing because we didn't move our mouse over it.

If there was a way to take a nicely formatted transcript and link it to the YouTube video and apply the interactivity to the formatted text, it would be amazingly awesome, although I'm not sure what benefit it would have over a nicely formatted text transcript with time stamps.

But I'll be the first to admit that I miss what may be obvious to other people. After all, I'm still trying to figure out the purpose of Alexa, and, to a lesser extent, cell phones.

 @phanley ,

Iframes do not have access to the CSS styling of their parent windows. You would need to style it yourself in this case.

That is a good idea Peter. James, it is only the youtube video that is in an iframe. The video transcript is part of the Canvas page, so it is possible to use any of the Canvas styling classes. I've added it as an issue in the GitHUB project.

Thank you James. I will add a better description of the features on the GitHUB page! I agree that the yellow highlighting is disturbing, and it should at least be made a bit more nuanced!

All new educational material in Norway must fulfill the WCAG 2.0 universal design requirements, that's why I started looking into this. I have just taken a MOOC at Coursera, and noticed that they had much better support for transcripts than Canvas. They do however not highlight the text, probably for the same reasons as you describe. But, if someone for some reason wanted to only hear the sound of the video and read the transcript on screen while listening, I guess the highlighting could come in handy. A button where you can turn the highlighting on/off, and/or a parameter in the Javascript to turn that functionality on/off could be a start.

I am not sure what you mean by a nicely formatted transcript from YouTube. If you want line breaks etc., that is easy to add. YouTube does not support automatic subtitling of norwegian yet, so we have to write them in manually.

The purpose of the 0:00 timestamp is to be able to jump to the very start of the video. The UX and UI part can surely be improved.

Anyway, I agree that the best part is that you can search through the transcript for words you remember from the video and then be able to jump to that part of the video.

I am not sure what you mean by a nicely formatted transcript from YouTube.

What I meant was I didn't think that YouTube would identify who the speaker was or put breaks into the text. Other people may be able to speak in full and complete sentences, but I normally have a sentence that takes more than one screen of captions, so you wouldn't be able to break just on sentences.

What I consider a nicely formatted transcript would be one that indicates the speaker (if it changes), and has paragraphs with marked-up code, things that identify sound effects, etc. Basically what they describe at W3C Multimedia Accessibility FAQ and the example Podcast: Interview on WCAG 2 that was linked from there. What I was trying to say is that I don't think YouTube can give that level of captioning, but I might be missing something.

That WCAG page says that captioning is essentially a transcript synchronized with the video. That's what yours does and in a way that allows people to do translations on the page if needed, whereas closed captioning wouldn't be as easy to translate.


I've been playing around with this script, and made a few changes based on my own ideas (mostly making it re-useable) and also sort of what  @James brought up regarding the formatting - I should probably get back to work for now, but I'd be happy to keep working at it

Here's basically where I am with it: 

It also works with the original post video:ål&lang=no 

But I can't figure out how to get it to work with arbitrary videos that have autogenerated captions: 

Apologies for the personal video, but it was the only one I had where I was familiar with the captions (since I edited them myself).


  • added Canvas css, although in retrospect it's probably not worth it since it only affects the typography of the transcript, which is maybe not worth the added bloat
    • although it is nice to use the grid they use (iframe & transcript are side-by-side in col-xs-6 divs)
      • also I ended up adding the canvas style .content-box-mini to the btnSeek spans (in youtubeIT.js) 
  • dynamically generate the iframe & transcript div from a GET parameter (?v=Ux1iQBU09oA) that can be copied from a youtube url i.e.  uses a nice convenience obj called urlObject 
    var page_url = urlObject(window.location.href);
    ytvid = page_url.parameters.v || "Ux1iQBU09oA";
    lang = page_url.parameters.lang || "en";
    vname = || "";
    • this needs some work - it's a little tedious to figure out the transcript language/name, so it should default to the autogenerated transcript if possible, but I haven't figured that out yet  
  • Made the iframe "responsive" (sort of - it resizes with the window, anyway)
  • Changes to youtube.js:
    • changed the name of the youtube.js to youTubeIT.js, but that's just because I wanted to edit it and be able to switch between the original and my modifications easily. At some point I refactored the object from mmooc to youTubeIT just to make it feel more "library-ish"
    • the formatter addition required adding an function argument to the function to ensure the transcript being loaded before formatting, and playing around I changed the structure of the js a little bit so that the main function is named youTube and it gets called after the iframe_api is loaded like = youTube(formatter);
    • initalization is a little more complicated, but not really - you just have to declare formatter function before adding the youtubeIT.js script:  

      var formatter = function(){
          $('.btnSeek').each(function(i, val){
              var $p = $(val);
                  $p.html().replace(/^(Zaybee-Wan\:|Darth Paapa\:)/, '<strong>$1</strong>')

      $.getScript( "youtubeIT.js");

I think that's it?  I have some questions that I'll ask in another post.  

That's great Peter! I'll try to integrate your changes in the github project, unless you want to upload them yourself.

If you use the network inspector, you will see that the following url is called when you turn on CC on the auto generated CC video (😞

Network inspector:

Breaking it up gives:

Now, removing the signature f.ex. results in an error message if you try to get that url:,caps,v,xorp,expi... 

But, if you do the same with one of the videos with a manually added transcript, like this one (PfDK MOOC - Ny videreutdanning i digital kompetanse for lærere - YouTube ), you get this url for the timedtext in the network inspector: 

Breaking that up:

Looks quite similar, but removing the signature and everything else except the v, lang and name parameters for that sake, still gives the subtitles: 

So, grabbing the subtitles for ASR (Automatic Speech Recognition) subtitles behave differently for some reason.

I've tried looking into using the Google youtube data api for downloading captions (Captions: download  |  YouTube Data API  |  Google Developers ), but I can't get it to work without asking the user to login with a google account!

Also, I guess there is a risk that YouTube will remove support for the timedtext urls, making it risky to use that approach in any case Smiley Sad

One solution could be to store the subtitles on a private server, i.e dropbox f.ex. and download the subtitles from there instead. I tried that here: 

This html file uses that subtitle file: 

To make the javascript look in dropbox, I changed/added these lines in the js file:

   -> changed: var hrefPrefix = "";
   -> added:    var hrefPostfix = ".xml";

   -> changed:         var href = hrefPrefix + videoId + hrefPostfix;

I also had to change this:

                captionText = captions[i].textContent;
//                captionText = captions[i].textContent.replace(/</g, '&lt;').replace(/>/g, '&gt;');

The last change was because the ASR timedtext for your video contained color coding for some reason!

I guess one could have a parameter in the url indicating where the script should look for the subtitles.

I wonder why youtube makes it so difficult to show a transcript of their videos.