cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
a_mottura
Community Participant

Download annotated submissions via API

Jump to solution

Hi - I am trying to figure out a way to bulk download annotated submissions from Canvas. What I would like to be able to do is to download all annotated submissions for a particular student across a number of Canvas courses. This is that the student's tutor can look over the portfolio of marked and annotated submission for their tutee from time to time without having to go through all relevant courses. For practical reasons, I'd like tutors to actually get the files rather than a link to the file preview in docviewer (which shows the submissions).

I think the above cannot be done via the API. Using the submissions API (in the attachments), I can get the URL of the file (without annotations) or the preview url, which brings up the docviewer. The preview url seems to be an API call that gets redirected to a canvasdocs session. On that page, one can see the document with annotations, and there is a button on the top left that allows the viewer to download the annotated pdf.

Here is the question: is there a creative way of automatically download that pdf? (I am guessing there is no standard way to do this).

I looked at the source of the docviewer page and it seems that the download button calls some sort of javascript code. I guess the script produces a pdf of the submission + annotations and serves it up as a file to download...but I do not know and cannot think of a way of doing this in a batch way so that I can download all files...

Any ideas?

Thanks,

Alessandro

2 Solutions

Accepted Solutions
a_mottura
Community Participant

I thought I submitted it from the developer community page and assumed it would appear in the group...thanks!

I have 'followed' the group...

Alessandro

View solution in original post

0 Kudos
James
Community Champion

 @adamwarecs ,

I found out how to download the file without loading the redirected page or running any JavaScript. I was able to download the file from within the Advanced Rest Client without loading a browser, which means that you can do it without having a headless browser.

  1. GET the submission information (API call)
  2. GET the preview URL (API call). Do not automatically follow the redirection, you need the location from it.
  3. Obtain the redirect location, but don't call it. Let's call most of it location so that it looks like location/view?theme=dark
  4. Replace the location/view?theme=dark with location/annotated.pdf and then POST to that address. No payload is necessary.
  5. GET location/annotated.pdf/is_ready until it returns { ready:true }
  6. GET location/annotated.pdf and save your file

Step 5 didn't take very long, but I was running it on a very small document with few annotations.

View solution in original post

11 Replies
kona
Community Coach
Community Coach

 @a_mottura , greetings! Due to the technical nature of this question I’m going to share it with the Canvas Developers‌ group in the Community to see if they can help. You might also consider joining this group so you have access to their information and resources. 

Kona

a_mottura
Community Participant

I thought I submitted it from the developer community page and assumed it would appear in the group...thanks!

I have 'followed' the group...

Alessandro

View solution in original post

0 Kudos

Hi Alessandro,
Just wondering how you went with the question? I've got a similiar request as well.

James
Community Champion

 @a_mottura ,  @adamwarecs  

I spent a few minutes (15 or so) playing around with this tonight and might have figured something out. As I started to write it out, I figured out it was only successful if I loaded it within a browser, so you might need to use a headless browser to make the calls.

  1. Get the submission information through the Submissions API and look at the Preview URL.
  2. Call the Preview URL. You will get an HTML response, not JSON, that says "You are being redirected." but there is a link in there. There is also a Location header that you can use from the redirect.
  3. Load the redirected page. This is NOT an API call, do not include the authorization header. This returns all the sessionData information, the name, and all the other stuff that is needed. 
  4. Download the annotated PDF. Step 3 contains an annotated_pdf_download relative path or you can take the current URL and change the last part to /view/annotated.pdf or maybe /view/annotated.pdf?dl=1 or possibly just /annotated.pdf. Note that this is not an API call.

Here are some screen shots to help with the process (and to confirm that I have it right).

Get the submission information.

GET <instance>/api/v1/courses/896851/assignments/4994189/submissions/2175488

If there is an attachment, then the attachments property is an array. It should have a preview_url that looks something like this (URI decoded and line breaks added)

/api/v1/canvadoc_session?
blob={
"moderated_grading_whitelist":null,
"enable_annotations":true,
"enrollment_type":"admin",
"anonymous_instructor_annotations":false,
"submission_id":35142732,
"user_id":10000008296700,
"attachment_id":112491023,
"type":"canvadoc"
}
&hmac=<censored>

Call the preview URL.

Note that this is an API call so you'll need to include the token. I did not try playing around with the parameters in the blob, but there's an hmac code at the end to sign it as authentic, so you may not be able to. Luckily for you, the enable_annotations is true.

This comes back with a 302 FOUND error and a redirect. The location header looks like this:

https://canvadocs.instructure.com/1/sessions/
<really long code, perhaps a JWT access token>
/view?theme=dark

You can also extract it from the href in the body.

Load the redirected page

This is not an API call, don't use the authorization header that you normally use, but be sure to honor its request to set-cookie. This is HTML with an embedded script that sets window.DocViewer. Part of that is sessionData. Inside sessionData is an annotated_pdf_download and annotatedPdfUrl (for the pdfjs viewer) properties.

Note window.DocViewer is not JSON, it's JavaScript, so you might need to access window.DocViewer.sessionData.urls.annotated_pdf_download to get it. Alternatively, just take the current location and change the end to /view/annotated.pdf

Unfortunately, you cannot load either annotated_pdf_download or annotatedPdfUrl without letting the JavaScript on the page run (maybe it's establishing the session?) Anyway, without this step -- which I wasn't able to track down using Advanced REST Client (some use Postman) -- it wouldn't work. But if I load the page in the browser, even one I'm not logged into Canvas in, then it works.

It also has a URL for pdf_download (without the annotations).

Download the PDF

Once you have the correct address and things have been initialized, you can GET the path (it's a relative path without the hostname, so you may need to add the canvadocs.instructure.com to it).

Notes

Like I said, step 3 is the stumbling block. If I paste that URL into a different browser and then try the annotated_pdf_download link, I can get it. That's why I think a headless browser might work. I've also used Node with jsdom in some other cases (getting RollCall data out) and it may work here. Other people use selenium, but I don't know enough about it to recommend it.

It's now well over an hour after I started this and bed is calling (about 4 hours ago). I have other things that need done, so I won't be able to work on this right now, but hopefully this will get you closer to the goal.

You'll need to repeat this process for every submission, there's not a "download all submissions with annotations." There is a feature request for it: https://community.canvaslms.com/ideas/6232-bulk-download-of-annotated-crocodoc-documents-by-assignme... 

This is awesome..  @jward ‌ here is the article we talked about.

James
Community Champion

 @adamwarecs ,

I found out how to download the file without loading the redirected page or running any JavaScript. I was able to download the file from within the Advanced Rest Client without loading a browser, which means that you can do it without having a headless browser.

  1. GET the submission information (API call)
  2. GET the preview URL (API call). Do not automatically follow the redirection, you need the location from it.
  3. Obtain the redirect location, but don't call it. Let's call most of it location so that it looks like location/view?theme=dark
  4. Replace the location/view?theme=dark with location/annotated.pdf and then POST to that address. No payload is necessary.
  5. GET location/annotated.pdf/is_ready until it returns { ready:true }
  6. GET location/annotated.pdf and save your file

Step 5 didn't take very long, but I was running it on a very small document with few annotations.

View solution in original post

a_mottura
Community Participant

James,

this is awesome! I had gotten as far as your earlier post, but your final additional comment seems the right way to go about it. I will try it as soon as I have a minute, but I am confident it will do the trick!

Thank you!

Alessandro

Be interested to hear how this goes as well....

a_mottura
Community Participant

...and it does indeed work! So happy! Once step 1, 2 and 3 are done (this is where I got stuck before), the rest is very simple (in Python):

annotated_pdf_url = redirection_url[0:-15]+'annotated.pdf'
r = requests.post(annotated_pdf_url)
check_url = annotated_pdf_url+'/is_ready'
while requests.get(check_url).content == '{"ready":false}':
sleep(1)
r = requests.get(annotated_pdf_url)
open('annotated.pdf', 'wb').write(r.content)

The above assumes that redirection_url is the variable containing the url obtained from step 3 above. Obviously this requires to import requests and time.sleep.