Mastering Canvas Data 2: Unleash the Power of Data-Driven Decision-Making in Education

kyle_cole
Instructure
Instructure
2
8933

2022 - Blog Headers - 10x1.5 (2).png

What is Canvas Data 2?


Canvas Data is a comprehensive analytics solution. It offers administrators and institutions the ability to access and analyze data related to student and course activities within Canvas, including information such as user logins, assignment submissions, grades, and more. This robust data set empowers educators and institutions to make data-driven decisions, improve teaching and learning experiences, and enhance overall student outcomes by gaining valuable insights into user behavior and LMS usage patterns. Canvas Data plays a pivotal role in helping educational institutions tailor their strategies and support mechanisms to meet the unique needs of their students and faculty.


What can Canvas Data do?

  • Download raw data from a Canvas instance with 4 hours of data freshness.
  • There are 89 unique datasets (tables). A list of tables can be found here: Canvas Data 2 Schema
  • Data can be downloaded as JSON, CSV, TSV, and Parquet.

What is needed for Canvas Data?

  1. Canvas Data Access: Please request access from your Customer Success Manager. Once access is granted you will need to make sure to have the “Data Services Manage” permission enabled to access the api gateway.

  2. Data Warehouse: Set up a data warehouse or database where you can store the Canvas Data extracts. Common choices include Amazon Redshift, Google BigQuery, Microsoft Azure SQL Data Warehouse, or an on-premises solution. The data warehouse should have the capacity to handle large volumes of data efficiently.

  3. ETL Process: Develop an Extract, Transform, Load (ETL) process to regularly pull data from Canvas Data exports and load it into your data warehouse. This process typically involves scripting and automation to ensure data is up-to-date and accurate.

  4. Database Administration: Assign responsibilities for database administration and maintenance. This includes managing schema changes, optimizing queries, and ensuring data integrity.

  5. Analytics Tools: Use data analytics tools or platforms (e.g., Tableau, Power BI, Looker) to create visualizations, dashboards, and reports. These tools help you extract insights from Canvas Data and make informed decisions.

  6. Regular Maintenance: Continuously monitor and maintain the Canvas Data infrastructure. Ensure that data updates occur on schedule and address any issues promptly.

  7. Scalability: Plan for scalability as your institution grows. Ensure that your data infrastructure can handle increased data volumes and evolving analytics requirements.



A TLDR of Steps to use Canvas Data:




Step One:

Generate a Canvas Data 2 API Key => How do I generate a Canvas Data 2 API key?


Step Two: 

Now we download our Canvas Data files. This can be done via API call or the Command Line Interface (CLI) tool. 

Using the API with Postman:
Using the CLI tool:

Step Three:

Load the data into your database. Each database has a different process of loading the data but with the CLI tool you are able to synchronize with MySQL and Postgres databases => Obtain a full snapshot of a table in a database or data warehouse


Step Four:

Compose an SQL query to retrieve the desired data.

Example Query: Example Query Response:
SQL Example.png Redo.png

Step Five (Optional):

Now let's build out a visual for our report. You will want to use a visualization tool such as Tableau, Power BI,  or Lookr. 

Enrolled Students.png

                           

 

 

 

 

 

 

 

                                (Example built-in Lookr)                 

Tags (1)
2 Comments
mbmacdonald
Community Participant

Hi @kyle_cole - Very helpful post. Can you explain, in layman's terms, why step 3 and step 4 are necessary? I'm not saying that I think they're not, I just want to understand.

Why can't you make an API call to download the data and then just do what you want with it? For example, make an API call to download the data, then write your own script to manipulate the data and/or load it into a visualization tool?

I'm guessing that doing that would mean the data would be stored on the local machine of the person making the call - which could potentially work for a one-time process, but would not be good for a situation where data is downloaded automatically at regular intervals. Is that correct? Or is it not possible for another reason?

kyle_cole
Instructure
Instructure
Author

@mbmacdonald Hey! You definitely can do it that way. If you are working with a couple of tables that should be manageable but if you are downloading the entire dataset I do not see that being very efficient or scalable.