Your Community is getting an upgrade!
Read about our partnership with Higher Logic and how we will build the next generation of the Instructure Community.
Canvas Data provides a wealth of information that can be used in many interesting ways, but there are a few hurdles that can make it hard to even get started:
This tutorial will show you how to build a data warehouse for your Canvas Data using Amazon Web Services. Besides solving all of the problems above, this cloud-based solution has several advantages compared to maintaining a local database:
Before you begin this tutorial, you should make sure that you've got an AWS account available, and that you have administrator access. You'll also need an API key and secret for the Canvas Data API.
Experience with relational databases and writing SQL will be necessary in order to query your data. Experience with AWS and the AWS console will be helpful.
Be aware that you'll be creating AWS resources in your account that are not free, but the total cost to run this data warehouse should be under about $10/month (based on my current snapshot size of 380GB). There's also a cost associated with the queries that you run against this data, but typical queries will only cost pennies.
All of the code used in this tutorial can be found in GitHub:
GitHub - Harvard-University-iCommons/canvas-data-aws: Build a Canvas Data warehouse on AWS
We'll use several different AWS services to build the warehouse:
It'll take several minutes for all of the resources defined in the CloudFormation template to be created. You can follow the progress on the Events tab. Once the stack is complete, check your email -- you should have received a message from SNS asking you to confirm your subscription. Click on the link in the email and you'll be all set to receive updates from the data-synchronization process.
Now we're ready to load some data!
Instructure's documentation for the Canvas Data API describes an algorithm for maintaining a snapshot of your current data:
The CloudFormation stack that you just created includes an implementation of this algorithm using Lambda functions. A scheduled job will run the synchronization process every day at 10am UTC, but right now we don't want to wait -- let's manually kick off the synchronization process and watch the initial set of data get loaded into our warehouse.
To do that, we just need to manually invoke the sync-canvas-data-files function. Back in the AWS console, access the Lambda service. You'll see the two functions that are used by our warehouse listed -- click on the sync-canvas-data-files function.
On this screen you can see the details about the Lambda function. We can use the AWS Lambda Console's test feature to invoke the function. Click on the Configure test events button, enter a name for your test event (like "manual"), and click Create. Now click on the Test button, and your Lambda function will be executed. The console will show an indication that the function is running, and when it's complete you'll see the results. You'll also receive the results in your email box.
When the Lambda function above ran, in addition to downloading all of the raw data files, it created tables in our Glue data catalog making them queryable in AWS Athena. In the AWS console, navigate to the Athena service. You should see something similar to the screenshot below:
You can now write SQL to query your data just as if it had been loaded into a relational database. You'll need to understand the schema, and Instructure provides documentation explaining what each table contains: https://portal.inshosteddata.com/docs
SELECT workflow_state, count(*) FROM course_dim GROUP BY workflow_state;
SELECT AVG(assignments) FROM (SELECT COUNT(*) AS assignments
FROM course_dim c, assignment_dim a
WHERE c.id = a.course_id
AND c.workflow_state = 'available'
AND a.workflow_state = 'published'
GROUP BY c.id);
If you don't want to keep your data warehouse, cleaning up is easy: just delete the "raw_files" folder from your S3 bucket, and then delete the stack in the CloudFormation console. All of the resources that were created will be removed, and you'll incur no further costs.
Good luck, and please let me know if you run into any trouble with any of the steps above!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
API fan; Python developer; tool builder; system integrator; Canvas administrator. Technical Architect at Harvard University.
To interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign InTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign In