Skip navigation
All Places > Canvas Developers > Blog > 2019 > January
2019

TLDR;

 I wrote a script that creates a Catalog Session and then downloads all Users' Transcripts. There are some subtle authentication tricks happening so if the code does not make sense refer to the following text. 

 

Why Did I want to download all Catalog User Transcripts?

Recently we have been looking into institution-branded storefront services other than Catalog. In this process, we have had to look into how to download every transcript before possibly sunsetting our Catalog instance. The simplest way to preserve the user data is to download a user's transcript. Unfortunately, the transcript PDFs are not accessible by the API. However, I have found a workaround for this and I wanted to share it with others who are interested in creating a local store of transcript PDFs. 

 

What do I need before I do this?

 You will need to have admin rights to both the Catalog instance and the Canvas instance that is linked to the Catalog instance. 

 

Why Can't I Use My API Token to Authenticate?

To access transcripts seems relatively trivial; one could simply iterate through the Catalog User_Ids and make GET requests to '<CATALOG_DOMAIN>/transcripts/transcript.pdf?user_id=<USER_ID>'. Since this is not accessible with the API, you must simulate a 'login' or create a session ( python documentation ). Without creating a session any requests will be rerouted to the /login page.  This login page contains information that is passed to the login POST request that is hidden to the user. To obtain that info I use the lxml (https://lxml.de/lxmlhtml.html ) package to parse the HTML for these hidden values and then add my username and password to the form before sending it in a POST request to log in.

 

Why Can't I Make a Basic Request Now That I am 'Logged In'?

After simulating a login, I found that making a GET request to /transcripts/transcript.pdf?user_id=<USER_ID> would redirect me to /transcripts/transcript.pdf which is my own transcript. When I looked at the history of the redirect ( see the function history() ) I noticed that the parameter user_id was lost in the first redirect which was to /login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf . 

However, if I requested the page /login with params force_login=0 and target_uri=

%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D<USER_ID> , the request would ultimately redirect to the desired page! I am not fully sure why this worked and if anyone has any idea why I would love to know.

 

 

 

Python Script

  some information has been omitted for privacy and clarity. Please post a comment if there is anything unclear 

# Fill in your details here to be posted to the login form.
username = config.get('catalog','username')
password = config.get('catalog','password')
canvas_catalog_domain = config.get('instance','canvas_catalog')
catalog_domain = config.get('instance', 'catalog')
catalog_headers = {
    'Authorization': 'Token token="%s"' % (config.get('auth_token','catalog')) ,
}
catalog_ids = ## A LIST OF CATALOG USER IDS ##

# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
    login = s.get(catalog_domain+'/login/canvas')
    # print the html returned or something more intelligent to see if it's a successful login page.
    #print(login.text)
    login_html = lxml.html.fromstring(login.text)
    hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
    form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
    #print("form: ",form)

    form['pseudonym_session[unique_id]']= username
    form['pseudonym_session[password]']= password
    response = s.post(catalog_domain+'/login/canvas',data=form)
    #print(response.url, response.status_code) # gets <domain>?login_success=1 200
    #pp(response.headers)
    # An authorised request.
    if int(response.code) != 200:
        raise Exception("Login failed with :", response.code )

    for user_id in catalog_ids:
        #print('user_id: ',user_id)
        # getting transcript pdf
        r = s.get(catalog_domain+'/login?force_login=0&target_uri=%2Ftranscripts%2Ftranscript.pdf%3Fuser_id%3D' + user_id, headers=catalog_headers)
        history(r)
        if int(r.status_code) != 200:
            # possible error
            error_log.write('%s -- %s ERROR \n' % (r.url, r.status_code))
        else:  # lets continue getting info
            filename = 'pdfs/%s_catalog_transcript.pdf' % (user_id)
            with open(filename, 'wb') as f:
                f.write(r.content)


## HELPER FUNCTION TO TRACE THE REQUEST ##
def history(r):
if r.history:
print("Request was redirected")
for resp in r.history:
print('\t',resp.status_code, resp.url)
print ("Final destination:")
print('\t',r.status_code, r.url)
else:
print("Request was not redirected")

 

 

I hope this post helps other Canvas Developers out there! Feel free to contact me if you are trying to troubleshoot running this script or a variation of it. 

 


Maddy Hodges 

Courseware Developer
University of Pennsylvania

In conjunction with experiments concerning using an LTI interface to realize a dynamic quiz, I built a version of Canvas that could run in a Virtual Machine (specifically an image that could be run by VirtualBox).

Advantages of this approach include:

  • easy to try things without any risk to the real Canvas infrastructures
  • easy to give an OVA image of the VM to students

The image was built using an ubuntu version of linux by following the Quick Start instructions at https://github.com/instructure/canvas-lms/wiki/Quick-Start.

Details of the operation of the different containers that comprise this system will be described in a forthcoming Bachelor's thesis.

I should note that when I do the docker-compose to bring up the Canvas instance, it takes a very long time.

In order to do some experiments with the dynamic quiz, I created some fake users and enrolled them in a course. Additionally, since one of the things that I would like the dynamic quiz to exploit is knowledge of what program of study these users are in I augments the user's custom data with details of what program they are in.

The initial version of the program (create-fake-users-in-course.py) can be found at  https://github.com/gqmaguirejr/Canvas-tools .

The result is a course with a set of fake users as can be seen in the list of user's in this course:

List of users in the course

An example of fake program data is:

custom data for user Ann FakeStudent is {'data': {'programs': [{'code': 'CINTE', 'name': 'Degree Programme in Information and Communication Technology', 'start': 2016}]}}

 

Some further details about the above can be found at Creating fake users and enrolling them into a course: Chip sandbox