Guide - Using Canvas Data v2 to download exports of tables using Powershell

JosephGullo
Community Explorer

We, at California Northstate University, are implementing canvas data v2, but we are finding that unless you use the python client library (which is not an option for us, we have no python developers or anyone to maintain a python environment) the documentation is extremely inadequate.  We are attempting to communicate with the API using powershell to download selected data exports for import into Microsoft SQL using other methods.  You probably don't want to do this, but as a non-developer, there was no sample code out there and it was a MASSIVE struggle.  I want to share what we came up with so other people aren't starting from scratch, because the official documentation is woefully inadequate.

First, you will need to create a Canvas Data v2 API key, which is documented here: https://community.canvaslms.com/t5/Admin-Guide/How-do-I-generate-a-Canvas-Data-2-API-key/ta-p/560955 These will need to be set as system environment variables for security, but for our purposes I'm stashing them as powershell variables because it makes it easier to explain.  Assuming you have those, the powershell below will generate an API Access token which you can store and will persist for some time (like 10 minutes maybe?  If it expires, you just re-run it and get a new token, the expiration is in there in the results of the $ACCESS_TOKEN invoke-restmethod

 

$CLIENT_ID="ENVIRONMENT VARIABLE, REDACTED FOR DEMONSTRATION"
$SECRET="ENVIRONMENT VARIABLE, REDACTED FOR DEMONSTRATION"
$Header = @{"Authorization" = "Basic "+[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("${CLIENT_ID}:${SECRET}"))}
$body = @{
    "grant_type" = "client_credentials"
}
$ACCESS_TOKEN = (Invoke-RestMethod -Method POST -Header $Header -ContentType "application/x-www-form-urlencoded" -Body $body -uri "https://api-gateway.instructure.com/ids/auth/login").access_token

 

Great!  Now we can use that access token to do some querying.  I want to focus on 2 broad queries which are useful before we actually get our data, first I want to get a list of tables.  This is one of the documented examples in their API page here: https://api-gateway.instructure.com/doc/ (which is infuriating to use, you cannot easily link to specific tags in the document as it doesn't update the URL).  The following code will use the existing access token to show a list of tables which the API key has access to:

 

$ReqHeader = @{"x-instauth" = "${ACCESS_TOKEN}"}
$ReqBodyJson = $ReqBody | convertto-json
$ReqContentType = "application/json"
(Invoke-RestMethod -Header $ReqHeader -Method GET -ContentType "application/json" https://api-gateway.instructure.com/dap/query/canvas/table).tables

 

Get the output yourself, but for my case, I want to get the "users" table as an example.  We can directly request the schema for this table with the following query:

 

$ReqHeader = @{"x-instauth" = "${ACCESS_TOKEN}"}
$ReqContentType = "application/json"
(Invoke-RestMethod -Header $ReqHeader -Method GET -ContentType "application/json" https://api-gateway.instructure.com/dap/query/canvas/table/users/schema).schema | convertto-json | out-file users.schema.json

 

This may be useful later to you.  Finally, we need to download our data; in our case it's a tsv we desire.  You don't actually just get a link; your data request becomes a job, and when the job completes an object is created, and that object has a URL identifier which is only in existence for a short time.  For that reason, this becomes a multi-step process, compounded in complexity by the fact that Powershell doesn't handle json -> object conversion gracefully (in my opinion).  The code to do all this is below, for the users table, and if you follow along, we first create a request for a tsv file, then we check the job status until it reads completed, then we find the object created in that job and download the file at the indicated URL, and in my case, we use 7-zip to extract it:

 

$ReqHeader = @{"x-instauth" = "${ACCESS_TOKEN}"}
$ReqContentType = "application/json"
$ReqBody = @{
    "format" = "tsv"
    "mode" = "expanded"
}
$ReqBodyJson = $ReqBody | convertto-json
$RequestID = (Invoke-RestMethod -Header $ReqHeader -Method POST -ContentType "application/json" -Body $ReqBodyJson https://api-gateway.instructure.com/dap/query/canvas/table/users/data)
$JobStatusURL = "https://api-gateway.instructure.com/dap/job/" + $RequestID.id
$RequestStatus = (Invoke-RestMethod -Header $ReqHeader -Method GET -ContentType "application/json" $JobStatusURL)
$RequestStatus
while ( $RequestStatus.status -ne 'complete' ) {
    $RequestStatus = (Invoke-RestMethod -Header $ReqHeader -Method GET -ContentType "application/json" $JobStatusURL)
    $RequestStatus.status
	sleep 5
}
$ObjReqBody = convertto-json $RequestStatus.objects
$ObjReqBody
$ObjUrlID = ($RequestStatus.objects).id
$URLs = Invoke-RestMethod -Header $ReqHeader -Method POST -ContentType "application/json" -Body $ObjReqBody https://api-gateway.instructure.com/dap/object/url
$URLs
foreach( $url in $URLs ) { 
    ($url.urls.$ObjUrlID).url
    Invoke-WebRequest -Header $ReqHeader -Method GET ($url.urls.$ObjUrlID).url -OutFile users.tsv.gz 
	& "C:\Program Files\7-Zip\7z.exe" x users.tsv.gz
}

 

Ok, cool, it works!  But ultimately, there's a lot of code re-use in there, and we would likely ultimately want to download MULTIPLE tables, not just the users table, and there's no reason to do that all at once; we'd like ultimately to just pass a list of tables to download and have them appear.  That, gets us the following:

 

$CLIENT_ID=$Env:CanvasClientID
$SECRET=$Env:CanvasSecret
$datestamp = get-date -format "yyyy.MM.dd"
$Header = @{"Authorization" = "Basic "+[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("${CLIENT_ID}:${SECRET}"))}
$body = @{
    "grant_type" = "client_credentials"
}
$ACCESS_TOKEN = (Invoke-RestMethod -Method POST -Header $Header -ContentType "application/x-www-form-urlencoded" -Body $body -uri "https://api-gateway.instructure.com/ids/auth/login").access_token

$TableList = @(
	"courses"
	, "accounts"
	, "users"
	, "pseudonyms"
	, "enrollment_terms"
	, "roles"
	, "enrollments"
	, "scores"
)
forEach ($Table in $TableList) {

	$ReqHeader = @{"x-instauth" = "${ACCESS_TOKEN}"}
	$ReqContentType = "application/json"
	$ReqBody = @{
		"format" = "tsv"
		"mode" = "expanded"
	}
	$ReqBodyJson = $ReqBody | convertto-json
	$RequestURL = "https://api-gateway.instructure.com/dap/query/canvas/table/" + $Table + "/data"
	$RequestID = (Invoke-RestMethod -Header $ReqHeader -Method POST -ContentType "application/json" -Body $ReqBodyJson $RequestURL)
	$JobStatusURL = "https://api-gateway.instructure.com/dap/job/" + $RequestID.id
	$RequestStatus = (Invoke-RestMethod -Header $ReqHeader -Method GET -ContentType "application/json" $JobStatusURL)
	$RequestStatus
	while ( $RequestStatus.status -ne 'complete' ) {
		$RequestStatus = (Invoke-RestMethod -Header $ReqHeader -Method GET -ContentType "application/json" $JobStatusURL)
		$RequestStatus.status
		sleep 5
	}
	$ObjReqBody = convertto-json $RequestStatus.objects
	$ObjReqBody
	$ObjUrlID = ($RequestStatus.objects).id
	$URLs = Invoke-RestMethod -Header $ReqHeader -Method POST -ContentType "application/json" -Body $ObjReqBody https://api-gateway.instructure.com/dap/object/url
	$URLs
	foreach( $url in $URLs ) { 
		($url.urls.$ObjUrlID).url
		$DownloadFileName = $dateStamp + "." + $Table + ".tsv.gz"
		Invoke-WebRequest -Header $ReqHeader -Method GET ($url.urls.$ObjUrlID).url -OutFile $DownloadFileName
		& "C:\Program Files\7-Zip\7z.exe" x $DownloadFileName
		rm $DownloadFileName
	}
}

 

This assumes the API key information is stored in environment variables, that 7-zip is installed in program files as indicated, and that the table list I indicated is the ones you want (you can add more from the list of tables I indicated above).  In the directory this is run from, it will produce date-stamped tsv files for each table, as desired.

I am not sure anyone out there is actually going to use this as we did, but I think the pieces here are instructive for someone looking to dip their toes into the API when they're not a Python developer.

 

EDIT: I am attaching (here and in the first 2 comments) the powershell script we have that runs as a scheduled task to download the data, then the 10 sql scripts representing the import of the tables we specifically need, performed as a sql server agent job.  These do all the data-type validation and null handling.  The paths would of course need to be updated for anyone to use these, and ensure 7-zip is installed.