Showing results for 
Show  only  | Search instead for 
Did you mean: 

Canvas Data FAQ

Canvas Data FAQ



Canvas Data is a service from Canvas that will provide schools with optimized access to their data for reporting and queries.

This document is meant to serve as a place to find answers to common questions you may have about Canvas Data. Instructure will attempt to keep this document updated as trends in questions arise that are not addressed within this FAQ. Please feel free to add any additional questions as a comment, and we will do our best to answer them and add to this document.

General Canvas Data Questions

Question Answer

How do I add another admin to the Canvas Data Portal?

How do I manage Canvas Data admin users? 

Where do I find the data schema?

Canvas Data API Documentation

A JSON version can also be downloaded from the API with the endpoint: GET /api/schema/latest

What are 'dim' and 'fact'?

Canvas Data uses Kimball methodology to create a star-schema. "dim" stands for dimension and "fact" stands for fact. More info on star schemas and Kimball methodology can be found at

Essentially, facts provide more general detail about the item. For example, you may use a fact to count enrollments. Meanwhile, the dimensions provide insight into the data about an item, meaning the dimension could get be used to show how many enrollments are "Teacher" enrollments.

Is historical data available for the requests table?

Yes—Historical Data is loaded the second Wednesday of every month following the activation of Canvas Data in Canvas.

Historical requests (page view) data needs to be uploaded separately after Canvas Data has been enabled. Schools that are enabled with Canvas Data will get their historical data loaded in batch sometime during the first month of activation. Once historical data has been loaded, it doesn't need to be updated again and again.

Note: If you have missed your historical load, please contact your CSM to have the data reloaded.

How far back does the historical data go?

With the exception of the requests table. All historical data since the beginning for the customer's subscription is included. For requests data historical data is loaded starting with 2014-03-01 or the beginning of the customer's subscription. Whichever is later.

What time are the daily files/Redshift/API available for download?

Typically, the dump completes around 2:00 AM, Mountain Time. This data is the same across the flat files available in the Canvas Data Portal UI, the data used for the population of Redshift, and files available via the Canvas Data API endpoints.

Note: This time is not guaranteed as many external factors may cause the load to be later in the morning.

Does each day's flat file contain data only for that day, or does it contain historical data with the new data added?

Excluding the requests table, each file will continue to grow as we continuously append the previous day's data to the table. We do not provide deltas for these files.

Due to the nature and potential size of the requests table, we only provide the previous day's data.

What is the data model for managing transactions?

There are no transactions. Flat files are a complete refresh except for requests which is append-only. Redshift is read-only.

The data exports have a ".gz" extension on them when downloaded through the UI, why?

The files are in GZIP format. There are several open-source and commercial tools to unpack these files for either Windows and Mac OS. A popular free tool is 7zip ( We do not publish the files in any other formats.

How do I open flat files?

Files are tab-delimited files. These can be opened with Excel, text editor, Tableau, or any other program that can open ".txt" files. Once you open the raw .txt file, you will need to reference our schema documentation to add headers. This can be avoided by using the API to download the data into a data warehouse.

Instructure also has built an open-source command line tool, capable of adding these headers in. Use the link below for instructions on installation and usage. The user will need to download their data with the CLI, and then use the "unpack" command.

GitHub - instructure/canvas-data-cli

Why can't headers be generated for the columns in the Canvas Data CSV export?

The primary reason is that most of the tables have more than one file. If we put headers at the top of one of the files (or all of them), it makes it more cumbersome to use simple command line tools like cat, awk, grep, wc, etc, to manipulate the data. Some customers wanted this and some did not. The choice was made to not include them.

The Canvas Data API can be used to get a JSON-based schema that can be used to generate headers.

The CLI can also be used for those customers wanting headers in their regular flat files.

Is there a way I download all the data files at once?

Yes, this is what the API is meant for (Canvas Data Portal API).

A cli tool is also available to use:

Where is the API documentation?

Canvas Data Portal

What exactly is "Hosted Data Services"?

Hosted Data Service is a service that allows Instructure to manage Canvas Data for you by automating the loading of Canvas Data into an Amazon Redshift instance.

While using this service, Instructure ensures that your data stays up to date, handles any schema/process changes, and handles the management of the large data set for you. The service also allows your team to focus on querying the data. This can be done by using tools that allow for OBDC connections.

For more information and pricing, please contact your CSM.

Is there any sort of orientation discussion for Canvas Data?

If you need an orientation to Canvas Data, please reach out to your CSM to schedule one.

Is there consulting for Canvas Data?

Yes—Canvas Data consulting is available for assistance in understanding the Canvas Data schema and creating reports based on the Canvas Data schema.

For more information about Canvas Data consulting and pricing, please contact your CSM.

What data is currently not in Canvas Data?

The data available in Canvas data will be a fraction of the data available in the main Canvas API endpoints. Rather than list all of the items not available in Canvas Data, it's best to review the Canvas Data schema to see the data that is available.

These items are not in Canvas Data and are asked about most frequently:

  • "Total Activity", as seen in the "Users" tab within a Canvas course

  • Syllabus (Calendar portion)

  • Assignment Rubrics

  • Quiz Question Answer Submissions

  • Calendar Events & Scheduler

  • ePortfolios

Some of the tables listed in the Canvas Data schema are not in my daily data dumps. Why is that?

We do not provide data for empty tables. If you are sure that there should be data in your missing tables, please reach out to our support staff (

Why do I sometimes see duplicate files for the same date & time in my Canvas Data Portal?

This is a known issue in Canvas Data that will happen from time to time. One of our jobs gets a false negative health check once in a while. As a result, we start a new / duplicate job to eliminate any possibilities for missing files. It is a very rare occasion and should resolve on its own with the next run. 

Note: While the rows are in a different order, the content is the same.


Canvas Data Implementation Questions

Question / Issue Possible Solution

What is required to implement the Canvas Data Integration? How long does it take?

Canvas Data is a service that provides each client access to key Canvas data points, delivered in a purposefully optimized form for queries and reports. If you are interested, contact your Customer Success Manager or Implementation Consultant and they can schedule time for the implementation.
What does the Canvas Data integration process look like?

Canvas' implementation team will lead you through the Canvas Data integration process after you have made the request. The process for enabling a customer to use Canvas Data involves the following steps:

  • Notify your Customer Success Manage or Implementation Consultant that you would like to enable Canvas Data for your instance.
  • Your Implementation Consultant will meet with you and find out what admin user you want to designate as your primary Canvas Data administrator.
  • The Implementation Consultant will add the primary Canvas Data administrator to the Canvas Data system and install the External App into your Canvas account.
What are the considerations for Canvas Data admin?

You must designate a Canvas Data Administrator to manage who is given access to the Canvas Data dataset, which is the entire dataset in Canvas including personal information about users. They will also manage information about which IP address ranges can access the database. The Canvas Data Administrator must be an Account Admin. The individual should be Canvas account admin that understands the data governance procedures and policies for the organization and also have enough technical proficiency to understand IP address ranges and database connection strings. The Canvas Data Administrator will receive access to the documentation to generate Canvas Data files and downloads.


Canvas Data Portal UI Troubleshooting Questions

Question / Issue Possible Solution

Error when trying to access the Canvas Data LTI: Insufficient Access to use LTI Tool

The user must also be an account admin at the root level. If you are an account admin and do not have access, you will need to contact another account admin who has access to the Canvas Data Portal to add you.

I see a user I did not add inside the Canvas Data Portal LTI.

Any user capable of viewing the main Admin account page within Canvas will also have access to click on the "Canvas Data Portal" link. When a user clicks on this link, that user is automatically added as a user without any permissions to the Canvas Data Portal.

These users can be removed or, if not removed from the user list, will remain as users without any permissions until removed.


Redshift Troubleshooting Questions

Question / Issue Possible Solution

I don't see any tables in Redshift.

Ensure that you are connected to the correct database name. The name will be the same as your Canvas instance.

Example: Your Canvas URL is "", your database name will be "someschool"

What is the Redshift database name?

The database name is the same as your Canvas instance name. You can also derive what it is by looking at your Redshift hostname through the Canvas Data Portal.

Example: Your Canvas URL is "", your database name will be "someschool"

Example: If the hostname is "", then the database name would be "xyzu"

I cannot connect to Redshift.

This is most likely an IP whitelisting issue. Please ensure that you have added your computer's IP address to the whitelist area with the Canvas Data Portal under "Credentials".

If you want to eliminate whitelisting as a possible source of the problem, you can add an entry to the whitelist to enable all IP addresses with If the connection goes through after this, it was a whitelisting issue. If not, other likely issues are ODBC/JDBC driver issues.

My Redshift username and password are not working.

Try regenerating your credentials. If this still does not work, please reach out to your CSM.


CLI Tool Troubleshooting Questions

Question / Issue Possible Solution

The CLI tool is failing, and I don't know why.

We recommend generating log files and filing a support case. To generate logs, simply run the CLI again with the extra argument of "-l debug" (minus the quotes). The output will be your log file(s).

How do I know what commands I can run?

All commands from the CLI tools can be listed by simply running "canvasDataCli --help". If you need help with a specific command, you can add it to the help command. E.g. "canvasDataCli sync --help".

How can I update the CLI?

You can update the CLI by running: "npm update -g canvas-data-cli".

How do I know what version of the CLI I'm on?

You can check your version of the CLI by running: "canvasDataCli --version".


Specific Data Questions

Question Answer

How do I update the gender / birthdate / country code in Canvas?

These fields are no longer used. They will likely be deprecated in future versions of Canvas Data.

Can I get the first name, last name separated out?

In the user_dim, both "name" and "sortable_name" are available and can be "split" in a manner that is most efficient for your reporting needs.

The sortable_name would help manage users with multiple last names as it is in the format of "last name, first name".

Why are some of the user_ids negative?

This is normal. We have obscured the user IDs so that join keys can be shared without sharing actual Canvas IDs to users. The same user_id is used across the other tables where the user_dim's "id" is referenced.

Some DBMS systems do not support unsigned 64-bit integers so we went with signed integers.

Where are student grades located?

Student grades are located in the course_score_fact (previously enrollment_fact but the fields within enrollment_fact were deprecated)

Can Canvas Data help with how faculty are using Canvas?

1) Whether faculty are using Canvas for their courses or not

2) The tools and features faculty use in Canvas in their first year

3) Growth in usage and use of functionality in year two.

One way to do this would be to determine the following:

  1. Courses with published assignments by enrollment term
  2. Courses with published discussions by enrollment term
  3. Courses with published quizzes by enrollment term
  4. External tool activations by course and enrollment term

This would give an indication of the extent to which faculty are using courses by determining how many assignments, discussions, quizzes, and external tool activations exists for each course. The numbers between year two and year one could then be compared to find the difference in usage by looking at the enrollment_term_dim.

I'm looking for a "data extract date" or "data as of date" so that we know when the data was loaded.

The best we can say is that the date associated with the latest dump is the data extract date.

Do we have an Entity Relationship Diagram on Canvas Data?

An ER diagram was created by someone in the Canvas Data community and the link is found below. However, the documentation describes the foreign key relationships well enough that the logic to join data from table to table can be found by reviewing the available documentation. Keep in mind that this is a community resource and includes all of the tables in the schema.

What data is in the course_ui_navigation tables?

These tables represent the navigation settings that have been chosen by instructors for different courses.

What exactly does "pseudonym" mean in Canvas?

In Canvas, users can have one or more logins. The table with information about logins and user SIS IDs is called pseudonyms in the underlying Canvas database.

Can I obtain login information for every user?

Using the pseudonym_fact and pseudonym_dim tables, you will be able to obtain login information and SIS IDs (if SIS IDs have been added) for each user.

Do the assignment tables include assignments, quizzes, and discussions?

Yes. Using the "submission_types" value in the assignment_dim will allow you to filter data based on the type of "assignment" it is:

  • Assignment = online_text_entry, online_url, media_recording, online_upload, external_tool, on_paper, none, not_graded
  • Quiz = online_quiz
  • Discussion = discussion_topic

Is any kind of record kept of the communication that occurs in conversations?

This would be within conversations. The associated tables to use are conversation_message_participant_fact, conversation_dim, and conversation_message_dim. Please review the schema documentation to see the data available in those tables.
Labels (1)
Was this article helpful? Yes No