Skip navigation
All Places > Canvas Developers > Blog > 2019 > June
2019

Introduction

Canvas Data provides a wealth of information that can be used in many interesting ways, but there are a few hurdles that can make it hard to even get started:

  • The Canvas Data API uses a different authentication mechanism than the one that you're probably already used to using with the Canvas API.
  • Data is provided in compressed tab-delimited files. To be useful, they typically need to be loaded into some kind of database.
  • For larger tables, data is split across multiple files, each of which must be downloaded and loaded into your database.
  • The size of the data can be unwieldy for use with locally-running tools such as Excel.
  • Maintaining a local database and keeping up with schema changes can be tedious.

 

This tutorial will show you how to build a data warehouse for your Canvas Data using Amazon Web Services. Besides solving all of the problems above, this cloud-based solution has several advantages compared to maintaining a local database:

  • You can easily keep your warehouse up to date by scheduling the synchronization process to run daily.
  • You can store large amounts of data very cheaply.
  • There are no machines to maintain, operating systems to patch, or software to upgrade.
  • You can easily share your Canvas Data Warehouse with colleagues and collaborators.

 

Before we get started

Before you begin this tutorial, you should make sure that you've got an AWS account available, and that you have administrator access. You'll also need an API key and secret for the Canvas Data API.

 

Experience with relational databases and writing SQL will be necessary in order to query your data. Experience with AWS and the AWS console will be helpful.

 

Be aware that you'll be creating AWS resources in your account that are not free, but the total cost to run this data warehouse should be under about $10/month (based on my current snapshot size of 380GB). There's also a cost associated with the queries that you run against this data, but typical queries will only cost pennies.

 

All of the code used in this tutorial can be found in GitHub:
GitHub - Harvard-University-iCommons/canvas-data-aws: Build a Canvas Data warehouse on AWS 

 

AWS services we'll use

We'll use several different AWS services to build the warehouse:

  • S3: we'll store all of the raw data files in an S3 bucket. Since S3 buckets are unlimited in size and extremely durable, we won't need to worry about running out of space or having a hard drive fail.
  • Lambda: we'll use serverless Lambda functions to synchronize files to the S3 bucket. Since we can launch hundreds or even thousands of Lambda functions in parallel, downloading all of our files is very fast.
  • SNS: we'll use the Simple Notification Service to let us know when the synchronization process runs.
  • Glue: we'll create a data catalog that describes the contents of our raw files. This creates a "virtual database" of tables and columns.
  • Athena: we'll use this analytics tool along with the Glue data catalog to query the data files directly without having to load them into a database first
  • CloudFormation: we'll use AWS' infrastructure automation service to set up all of the pieces above in a few easy steps!

 

Let's build a warehouse!

  1. Log into the AWS console and access the CloudFormation service.
  2. Click on the Create Stack button
  3. On the next screen, leave the Template is ready and Amazon S3 URL options selected. Below, Enter this S3 URL:
    https://huit-at-public-build-artifacts.s3.amazonaws.com/canvas-data-aws/canvas_data_aws.yaml
    Click Next. 
  4. On the stack details screen, first enter a name for this stack. Something like "canvas-data-warehouse" is fine. Enter your Canvas Data API key and secret in the fields provided. Enter your email address (so that you can receive updates when the synchronization process runs). You can leave the default values for the other parameters. Click Next.
  5. On the stack options screen, leave all of the default values and click Next.
  6. On the review screen, scroll to the bottom and check the box to acknowledge that the template will create IAM resources (roles, in this case). Click the Create stack button, and watch as the process begins!

 

It'll take several minutes for all of the resources defined in the CloudFormation template to be created. You can follow the progress on the Events tab. Once the stack is complete, check your email -- you should have received a message from SNS asking you to confirm your subscription. Click on the link in the email and you'll be all set to receive updates from the data-synchronization process.

 

Now we're ready to load some data!

 

Loading data into the warehouse

Instructure's documentation for the Canvas Data API describes an algorithm for maintaining a snapshot of your current data:

  1. Make a request to the "sync" API endpoint, and for every file returned:
    • If the filename has been downloaded previously, do not download it
    • If the filename has not yet been downloaded, download it
  2. After all files have been processed:
    • Delete any local file that isn't in the list of files from the API

 

The CloudFormation stack that you just created includes an implementation of this algorithm using Lambda functions. A scheduled job will run the synchronization process every day at 10am UTC, but right now we don't want to wait -- let's manually kick off the synchronization process and watch the initial set of data get loaded into our warehouse.

 

To do that, we just need to manually invoke the sync-canvas-data-files function. Back in the AWS console, access the Lambda service. You'll see the two functions that are used by our warehouse listed -- click on the sync-canvas-data-files function.

 

On this screen you can see the details about the Lambda function. We can use the AWS Lambda Console's test feature to invoke the function. Click on the Configure test events button, enter a name for your test event (like "manual"), and click Create. Now click on the Test button, and your Lambda function will be executed. The console will show an indication that the function is running, and when it's complete you'll see the results. You'll also receive the results in your email box.

 

Querying your data

When the Lambda function above ran, in addition to downloading all of the raw data files, it created tables in our Glue data catalog making them queryable in AWS Athena. In the AWS console, navigate to the Athena service.  You should see something similar to the screenshot below:

 

 

You can now write SQL to query your data just as if it had been loaded into a relational database. You'll need to understand the schema, and Instructure provides documentation explaining what each table contains: https://portal.inshosteddata.com/docs

 

Some example queries:

  • Get the number of courses in each workflow state:
    SELECT workflow_state, count(*) FROM course_dim GROUP BY workflow_state;
  • Get the average number of published assignments per course in your active courses:
    SELECT AVG(assignments) FROM (SELECT COUNT(*) AS assignments 
    FROM course_dim c, assignment_dim a
    WHERE c.id = a.course_id
    AND c.workflow_state = 'available'
    AND a.workflow_state = 'published'
    GROUP BY c.id);

 

Cleaning up

If you don't want to keep your data warehouse, cleaning up is easy: just delete the "raw_files" folder from your S3 bucket, and then delete the stack in the CloudFormation console. All of the resources that were created will be removed, and you'll incur no further costs. 

 

Good luck, and please let me know if you run into any trouble with any of the steps above!

New FERPA requirements for cross-listed courses! and others have commented on the problems of cross listing. However, the ability to do cross-listing is controlled by the same permission as being able to create/edit/delete sections. I find this to be odd. I think that the ability to cross-list should not be tied to the ability to utilize sections within a course.

 

As the app/controllers/sections_controller.rb has the following code (with the relevant line highlighted):

  # @API Cross-list a Section
  # Move the Section to another course.  The new course may be in a different account (department),
  # but must belong to the same root account (institution).
  #
  # @returns Section
  def crosslist
    @new_course = api_find(@section.root_account.all_courses.not_deleted, params[:new_course_id])
    if authorized_action(@section, @current_user, :update) && authorized_action(@new_course, @current_user, :manage)   
      @section.crosslist_to_course(@new_course, updating_user: @current_user)
      respond_to do |format|
        flash[:notice] = t('section_crosslisted', "Section successfully cross-listed!")
        format.html { redirect_to named_context_url(@new_course, :context_section_url, @section.id) }
        format.json { render :json => (api_request? ? section_json(@section, @current_user, session, []) : @section) }
      end
    end
  end

The check for authorized actions means that:

  • Unless the current user has the ability to update sections, the cross listing will not occur.
  • Unless the current user can manage the target course otherwise, the cross-listing will not occur.

Unfortunately, cross-listing provides a sneak path to add students to a course (as a teacher without administrative rights, but with the ability to create sections does a cross-listing - the students will be added to the target course - despite the fact that teacher does not have rights to add students to the course).

 

Moreover, if the current user can manually add students to the target course, then they can always add each of the students from a section to the target course. This means that there needs to be a check in the above code on whether the current user can enroll students in the target course. 

 

 

Therefore, the second test should be something similar to:

authorized_action(@new_course, @current_user, :manage)  && authorized_action(@new_course, @current_user, :manage_students) 

 

Where (based on looking at ./app/models/role_override.rb and spec/apis/v1/enrollments_api_spec.rb) :manage_students would enable the creation of student enrollments. Thus unless the user is allowed to enroll students in the target course, the cross-listing would not occur.

 

If the permission to add students ( permission: Users - add / remove students in courses) to imported courses (i.e., courses that are automatically created by SIS  imports) is disabled for the "Teacher" role, there should not be a problem in allowing teachers to create/edit/delete sections - while still meeting FERPA and other similar regulations (as there would not be any ability to cross-list a section worth of students and each section would only be within a single course). In fact, the use of sections could be used to reduce the visibility and interactions of students (and even teachers) to within their section, thus advancing student's privacy.

David Lyons

Module Filters

Posted by David Lyons Employee Jun 11, 2019

Disclaimer: This is a personal project, and is not endorsed by Instructure or Canvas LMS. Custom JavaScript must be maintained by the institution.

Most (great) product features come from a deep understanding of customers’ problems. It’s tempting to build every “good” or “obvious” feature someone can describe passionately but that leads to thoughtless bloat that breaks the UX. And most things people describe as “obvious” actually have 10,000 questions between the comment and a well researched/tested feature.

Sometimes the stars align and a conversation with an insightful person includes an offhanded “wouldn’t it be neat” comment that’s small enough to quickly prototype and test. And those are just the circumstances that led to this experiment: Module Filters.

Behold! Content filters!

The comment, which was part of a much larger conversation on organization and navigation, was

“Wouldn’t it be neat if you could filter by content type right on the Modules page in Canvas?”

and I agreed. Because Canvas supports custom JavaScript I was able to quickly mockup a functioning prototype for all-important testing and validation.

This project was a good candidate for me to experiment with because it’s

  1. small in scope
  2. technically possible
  3. UI/UX not immediately obvious

Small in Scope

Small changes a person/team can wrap their hands all the way around are ideal for quality, and ensuring it actually addresses the problem. Feature creep is very real though, and I had to repeatedly slap my own hand and say “No! That’s not part of what is being tested here!” Keeping things in scope is tough in the face of the endless waterfall of “wouldn’t it be neat if it also…

Technically Possible

What I mean by technically possible is that 1. the idea is literally possible at all and 2. within my ability to develop. JavaScript is great for uses exactly like this and Canvas allows for this kind of thing, and while the scope of the idea is small, if I knew nothing about HTML/CSS/JavaScript and had to learn all of that first the overall project would have been a somewhat larger commitment.

UI/UX

This is where the bulk of the work (and my excitement for the idea) went. “Filters” in apps don’t have a universal UI: sometimes they’re checkboxes, or a dropdown menu, or toggles, or happen automatically while typing, etc. None of those is right or wrong, and it depends on the situation which direction one might lean. My first version actually uses unstyled checkboxes with labels (which looked awful) just to make sure my code worked. Thinking about the UI/UX also helped me with feature creep in that the UI for a filter like content works well as checkboxes because a user might want any number of filter combinations on/off, but they wouldn’t work well to toggle a single binary state like “has due date”, for example. One might even want different types of filter simultaneously which requires a lot of additional considerations.

Ultimately I settled on an on/off toggle using the corresponding content icon instead of a checkbox with a label to support any combination of content types to be shown/hidden, and to avoid adding text to the app UI. Keeping the filters to just content type made the UI more approachable and let me focus on the UX of how it might be to actually use this feature.

Try It and Tell Me What You Think

I put the code on Github with an MIT license. If you play with it I’d love to hear your thoughts either on the repo or on twitter.