Canvas Data FAQ

Document created by smccann@instructure.com Employee on Nov 15, 2016Last modified by snelson on Feb 25, 2020
Version 18Show Document
  • View in full screen mode

 

Overview

This document is meant to serve as a place to find answers to common questions you may have about Canvas Data. Instructure will attempt to keep this document updated as trends in questions arise that are not addressed within this FAQ. Please feel free to add any additional questions as a comment, and we will do our best to answer them and add to this document.

 

General Canvas Data Questions

QuestionAnswer

How do I add another admin to the Canvas Data Portal?

How do I manage Canvas Data admin users? 

Where do I find the data schema?

Canvas Data API Documentation
A JSON version can also be downloaded from the API with the endpoint: GET /api/schema/latest

What are 'dim' and 'fact'?

Canvas Data uses Kimball methodology to create a star-schema. "dim" stands for dimension and "fact" stands for fact. More info on star schemas and Kimball methodology can be found at https://en.wikipedia.org/wiki/Dimensional_modeling

 

Essentially, facts provide more general detail about the item. For example, you may use a fact to count enrollments. Meanwhile, the dimensions provide insight into the data about an item, meaning the dimension could get be used to show how many enrollments are "Teacher" enrollments.

Is historical data available for the requests table?

Yes—Historical Data is typically loaded the second Friday of even months following the activation of Canvas Data in Canvas.

 

Note: If you have missed your historical load, please contact your CSM to have the data reloaded.

How far back does the historical data go?

With the exception of the requests table, all historical data since the beginning of your Canvas account creation, or 2014-03-01, is included in all tables.

What time are the daily files/Redshift/API available for download?

Typically, the dump completes around 2:00 AM, Mountain Time. This data is the same across the flat files available in the Canvas Data Portal UI, the data used for the population of Redshift, and files available via the Canvas Data API endpoints.

 

 

Note: This time is not guaranteed as many external factors may cause the load to be later in the morning.

Does each day's flat file contain data only for that day, or does it contain historical data with the new data added?

Excluding the requests table, each file will continue to grow as we continuously append the previous day's data to the table. We do not provide deltas for these files.

 

Due to the nature and potential size of the requests table, we only provide the previous day's data.

What is the data model for managing transactions?

There are no transactions. Flat files are a complete refresh except for requests which is append-only. Redshift is read-only.

The data exports have a ".gz" extension on them when downloaded through the UI, why?

The files are in GZIP format. There are several open-source and commercial tools to unpack these files for either Windows and Mac OS. A popular free tool is 7zip (https://www.7-zip.org/). We do not publish the files in any other formats.

How do I open flat files?

Files are tab-delimited files. These can be opened with Excel, text editor, Tableau, or any other program that can open ".txt" files. Once you open the raw .txt file, you will need to reference our schema documentation to add headers. This can be avoided by using the API to download the data into a data warehouse.

 

Instructure also has built an open-source command line tool, capable of adding these headers in. Use the link below for instructions on installation and usage. The user will need to download their data with the CLI, and then use the "unpack" command.

 

GitHub - instructure/canvas-data-cli

Why can't headers be generated for the columns in the Canvas Data CSV export?

The primary reason is that most of the tables have more than one file. If we put headers at the top of one of the files (or all of them), it makes it more cumbersome to use simple command line tools like cat, awk, grep, wc, etc, to manipulate the data. Some customers wanted this and some did not. The choice was made to not include them.

 

The Canvas Data API can be used to get a JSON-based schema that can be used to generate headers.

 

The CLI can also be used for those customers wanting headers in their regular flat files.

Is there a way I download all the data files at once?

Yes, this is what the API is meant for (Canvas Data Portal API).

 

You can also use the Canvas Data CLI tool:

How to Use the Canvas Data CLI Tool

Where is the API documentation?

Canvas Data Portal

What exactly is "Hosted Data Services"?

Hosted Data Service is a service that allows Instructure to manage Canvas Data for you by automating the loading of Canvas Data into an Amazon Redshift instance.

 

While using this service, Instructure ensures that your data stays up to date, handles any schema/process changes, and handles the management of the large data set for you. The service also allows your team to focus on querying the data. This can be done by using tools that allow for OBDC connections.

 

For more information and pricing, please contact your CSM.

Is there any sort of orientation discussion for Canvas Data?

If you need an orientation to Canvas Data, please reach out to your CSM to schedule one.

Is there consulting for Canvas Data?

Yes—Canvas Data consulting is available for assistance in understanding the Canvas Data schema and creating reports based on the Canvas Data schema.

 

For more information about Canvas Data consulting and pricing, please contact your CSM.

What data is currently not in Canvas Data?

The data available in Canvas data will be a fraction of the data available in the main Canvas APIhttps://canvas.instructure.com/doc/api/all_resources.html endpoints. Rather than list all of the items not available in Canvas Data, it's best to review the Canvas Data schema to see the data that is available.

 

These items are not in Canvas Data and are asked about most frequently:

  • "Total Activity", as seen in the "Users" tab within a Canvas course
  • Syllabus (Calendar portion)
  • Assignment Rubrics
  • Quiz Question Answer Submissions
  • Calendar Events & Scheduler
  • ePortfolios

Some of the tables listed in the Canvas Data schema are not in my daily data dumps. Why is that?

We do not provide data for empty tables. If you are sure that there should be data in your missing tables, please reach out to our support staff (canvasdatahelp@instructure.com).

Why do I sometimes see duplicate files for the same date & time in my Canvas Data Portal?

Note: While the rows are in a different order, the content is the same.

This is a known issue in Canvas Data that will happen from time to time. One of our jobs gets a false negative health check once in a while. As a result, we start a new / duplicate job to eliminate any possibilities for missing files. It is a very rare occasion and should resolve on its own with the next run. 

 

 

Canvas Data Portal UI Troubleshooting Questions

Question / IssuePossible Solution

Error when trying to access the Canvas Data LTI: Insufficient Access to use LTI Tool

The user must also be an account admin at the root level. If you are an account admin and do not have access, you will need to contact another account admin who has access to the Canvas Data Portal to add you.

I see a user I did not add inside the Canvas Data Portal LTI.

Any user capable of viewing the main Admin account page within Canvas will also have access to click on the "Canvas Data Portal" link. When a user clicks on this link, that user is automatically added as a user without any permissions to the Canvas Data Portal.

 

These users can be removed or, if not removed from the user list, will remain as users without any permissions until removed.


Redshift Troubleshooting Questions

Question / IssuePossible Solution

I don't see any tables in Redshift.

Ensure that you are connected to the correct database name. The name will be the same as your Canvas instance.

 

Example: Your Canvas URL is "someschool.instructure.com", your database name will be "someschool"

What is the Redshift database name?

The database name is the same as your Canvas instance name. You can also derive what it is by looking at your Redshift hostname through the Canvas Data Portal.

 

Example: Your Canvas URL is "someschool.instructure.com", your database name will be "someschool"

Example: If the hostname is "xyzu-redshift.prod.inshosteddata.com", then the database name would be "xyzu"

I cannot connect to Redshift.

This is most likely an IP whitelisting issue. Please ensure that you have added your computer's IP address to the whitelist area with the Canvas Data Portal under "Credentials".

 

If you want to eliminate whitelisting as a possible source of the problem, you can add an entry to the whitelist to enable all IP addresses with 0.0.0.0/0. If the connection goes through after this, it was a whitelisting issue. If not, other likely issues are ODBC/JDBC driver issues.

My Redshift username and password are not working.

Try regenerating your credentials. If this still does not work, please reach out to your CSM.

 

CLI Tool Troubleshooting Questions

Question / IssuePossible Solution

The CLI tool is failing, and I don't know why.

We recommend generating log files and filing a support case. To generate logs, simply run the CLI again with the extra argument of "-l debug" (minus the quotes). The output will be your log file(s).

How do I know what commands I can run?

All commands from the CLI tools can be listed by simply running "canvasDataCli --help". If you need help with a specific command, you can add it to the help command. E.g. "canvasDataCli sync --help".

How can I update the CLI?

You can update the CLI by running: "npm update -g canvas-data-cli".

How do I know what version of the CLI I'm on?

You can check your version of the CLI by running: "canvasDataCli --version".


Specific Data Questions

 

QuestionAnswer

How do I update the gender / birthdate / country code in Canvas?

These fields are no longer used. They will likely be deprecated in future versions of Canvas Data.

Can I get the first name, last name separated out?

In the user_dim, both "name" and "sortable_name" are available and can be "split" in a manner that is most efficient for your reporting needs.

 

The sortable_name would help manage users with multiple last names as it is in the format of "last name, first name".

Some of the user_ids are negative

This is normal. We have obscured the user IDs so that join keys can be shared without sharing actual Canvas IDs to users. The same user_id is used across the other tables where the user_dim's "id" is referenced.

 

Some DBMS systems do not support unsigned 64-bit integers so we went with signed integers.

Where are student grades located?

Student grades are located in the course_score_fact (previously enrollment_fact but the fields within enrollment_fact were deprecated)

Can Canvas Data help with how faculty are using Canvas?

 

1) Whether faculty are using Canvas for their courses or not

 

2) The tools and features faculty use in Canvas in their first year

 

3) Growth in usage and use of functionality in year two.

One way to do this would be to determine the following:

  1. Courses with published assignments by enrollment term
  2. Courses with published discussions by enrollment term
  3. Courses with published quizzes by enrollment term
  4. External tool activations by course and enrollment term

 

This would give an indication of the extent to which faculty are using courses by determining how many assignments, discussions, quizzes, and external tool activations exists for each course. The numbers between year two and year one could then be compared to find the difference in usage by looking at the enrollment_term_dim.

I'm looking for a "data extract date" or "data as of date" so that we know when the data was loaded.

The best we can say is that the date associated with the latest dump is the data extract date.

Do we have an Entity Relationship Diagram on Canvas Data?

An ER diagram was created by someone in the Canvas Data community and the link is found below. However, the documentation describes the foreign key relationships well enough that the logic to join data from table to table can be found by reviewing the available documentation. Keep in mind that this is a community resource and includes all of the tables in the schema.

 

https://dbdiagram.io/d/5d6fd44b83427516dc0b55a6

What data is in the course_ui_navigation tables?


These tables represent the navigation settings that have been chosen by instructors for different courses.

What exactly does "pseudonym" mean in Canvas?

In Canvas, users can have one or more logins. The table with information about logins and user SIS IDs is called pseudonyms in the underlying Canvas database.

Can I obtain login information for every user?

Using the pseudonym_fact and pseudonym_dim tables, you will be able to obtain login information and SIS IDs (if SIS IDs have been added) for each user.

Do the assignment tables include assignments, quizzes, and discussions?

Yes. Using the "submission_types" value in the assignment_dim will allow you to filter data based on the type of "assignment" it is:

  • Assignment = online_text_entry, online_url, media_recording, online_upload, external_tool, on_paper, none, not_graded
  • Quiz = online_quiz
  • Discussion = discussion_topic

Is any kind of record kept of the communication that occurs in conversations?

This would be within conversations. The associated tables to use are conversation_message_participant_fact, conversation_dim, and conversation_message_dim. Please review the schema documentation to see the data available in those tables.
13 people found this helpful

Attachments

    Outcomes