This document is meant to serve as place to find answers to common questions you may have about Canvas Data. Instructure will attempt to keep this document updated as trends in questions arise that are not addressed within this FAQ. Please feel free to add any additional questions as a comment, and we will do our best to answer them and add to this document.
General Canvas Data Questions
How do I add another admin to the Canvas Data Portal?
Where do I find the data schema?
|Canvas Data API Documentation|
A JSON version can also be downloaded from the API with the endpoint: GET /api/schema/latest
What are 'dim' and 'fact'?
Canvas Data uses Kimball methodology to create a star-schema. "dim" stands for dimension and "fact" stands for fact. More info on star schemas and Kimball methodology can be found at: https://en.wikipedia.org/wiki/Dimensional_modeling
Essentially, facts provide more general, detail about the item. For example you might use a fact to count the enrollments. Meanwhile Dim (dimensions), provide insight into the data about an item, so you might use the Dim to get how many enrollments are teacher enrollments.
Where is the requests table historical data?
Historical Data is typically loaded the first Friday of the month following the activation of Canvas Data in Canvas.
Note: If you have missed your historical load, please contact your CSM to have the data reloaded.
How far back does the historical data go?
|With the exception of the requests table, all historical data since the beginning of your Canvas account creation, or 2014-03-01, is included in all tables.|
What time are the nightly files/Redshift/API available for download?
While more frequent loads are being worked on typically, the dump completes around 2am Mountain Time, which includes data for the flat files, API, and Redshift.
Note: This time is not guaranteed as many external factors may cause the load to be later in the morning.
Is each day's flat file just that day or is it historical with the new data added?
Everyday all files, except for requests, will continue to grow as we continuously append the previous' days data to the table. We do not provide deltas for these files.
Due to the nature, and potential size, of the requests table, we only provide the previous day's data.
What is the data model for managing transactions?
|There are no transactions. Flat files are a complete refresh except for requests which is append only. Redshift is read only.|
The data exports have a .gz extension on them when downloaded through the UI, why?
|The files are in GZIP format. There are several open source and commercial tools for Windows. A popular free tool is 7zip. www.7-zip.org. We do not publish the files in any other formats.|
How do I open up flat files?
Files are tab delimited files. These can be opened with Excel, text editor, Tableau, or any other program that can open .txt files. Once you open up the raw .txt file, you will need to reference our schema documentation to add headers. This can be avoided by using the API to download the data into a data warehouse.
Instructure also has built an open-source command line tool, capable of adding these headers in. The link to this below has install instructions/download instructions. The user will need to download their data with the cli, and then use the unpack command.
Why can't headers be generated for the columns in the Canvas Data CSV export?
The primary reason is because most of the tables have more than one file. If we put headers at the top of one of the files (or all of them), it makes it more cumbersome to use simple commandline tools like cat,awk,grep,wc etc to manipulate the data. tl;dr: some customers want this some don't. we had to make a call.
The API can be used to get a JSON based schema that can be used to generate headers.
The CLI can also be used for those customers wanting headers in their regular flat files.
Is there a way I download all the data files at once?
Yes, this is what the api is meant for.
You can also use the Canvas Data CLI tool:
Where is the API documentation?
|Canvas Data Portal|
What exactly is Hosted Data Services?
Hosted Data Service is letting Instructure manage Canvas Data for you in an Amazon Redshift instance. Instructure will ensure that your data stays up to date, handles any schema/process changes, and handless worrying about storing the large amount of data for you. All you do is use any tool that accepts and OBDC connection to pull your data from Redshift.
For more information and pricing, please contact your CSM.
Is there any sort of orientation discussion for Canvas Data?
|If you need an orientation to Canvas Data, please reach out to your CSM to schedule one.|
What data is currently not in Canvas Data?
|Syllabus, Outcomes, Rubrics, Quiz Question Answer Submissions, Conferences, Calendar, and Scheduler|
I am constantly not receiving all of the tables in my daily data dumps
|We do not provide data for tables that are empty. If you are sure that there should be data in your missing tables, please reach out to our support staff.|
Canvas Data Portal UI Troubleshooting Questions
What causes this error when trying to access the Canvas Data LTI: Insufficient Access to use LTI Tool?
|The user must also be an account admin at the root level. If you are an account admin and do not have access, you will need to contact another account admin who has access to the Canvas Data Portal to add you.|
I see a user I did not add inside the Canvas Data Portal LTI: User Listed I did not add?
|Any user capable of viewing any account page will be able to click the Canvas Data link. However they will have no access until you give them access. This is because the "Global User ID" field required to associate users inside the portal cannot be seen in the ui/api.|
Redshift Troubleshooting Questions
I don't see any tables in Redshift
Ensure that you are connected to the correct database name. The name will be the same as your Canvas instance.
Example: Your Canvas url is "someschool.instructure.com", your database name will be "someschool"
What is the Redshift database name?
The database name is the same as your Canvas instance name. You can also derive what it is by looking at your Redshift hostname through the Canvas Data Portal.
Example: Your Canvas url is "someschool.instructure.com", your database name will be "someschool"
Example: If the hostname is "xyzu-redshift.prod.inshosteddata.com" then the database name would be "xyzu"
Cannot connect to Redshift
|This is almost always an IP whitelist problem. Please ensure that you have added your computer's IP address to the whitelist area with the Canvas Data Portal under "Credentials".|
If you want to eliminate whitelisting as a possible source of the problem, you can add an entry to the whitelist to enable all IP addresses with 0.0.0.0/0. If the connection goes through after this, it is a whitelist problem, if not, other likely issues are ODBC/JDBC driver problems.
Redshift username and password not working
|Try regenerating your credentials. If this still does not work, please reach out to your CSM.|
CLI Tool Troubleshooting Questions
The CLI tool is failing, and I don't know why?
We recommend getting logs, and filing a support case. To get your logs simply run the CLI again with the extra argument of "-l debug" (minus the quotes). The output will be your logs.
How do I know what commands I can run?
|All commands from the CLI tools can be listed by simply running "canvasDataCli --help". If you need help with a specific command, you can add it to the help command. E.g. "canvasDataCli sync --help".|
How can I update the CLI?
|You can update the CLI by running: "npm update -g canvas-data-cli".|
How do I know what version of the CLI I'm on?
|You can check your version of the CLI by running: "canvasDataCli --version".|
Specific Data Questions
How do I update the gender / birthdate / country code in Canvas?
|These fields are no longer used. They will likely be deprecated in future versions of Canvas Data.|
Can I get first name, last name separated out?
|Although the data is entered separately it is stored in the database after it is concatenated. We do not have it separated out. There is the sortable_name field that lists the name a "last name, first name" that can be used if the user wants to sort by last name. There is also a short_name field that is just the user's nickname.|
Some of the user_id's are negative
|This is normal. We have obscured the user ids so join keys can be shared without sharing actual Canvas IDs to users. We wanted to use the entire 64 bit integer range. Some DBMS systems do not support unsigned 64 bit integers so we went with signed integers.|
Where are student grades?
Can Canvas Data help with how faculty are using Canvas?
1) Whether faculty are using Canvas for their courses or not.
2) What tools and features faculty are using in CANVAS in their first year 1.
3) Growth in usage and growth in use of functionality in year 2.
|one way to do this would be to determine the following: 1) courses with assignments by enrollment term 2) courses with discussions by enrollment term 3) courses with quizzes by enrollment term 4) external tool activations by course and enrollment term|
This would give an indication of the extent to which faculty are using the course shell by determining how many assignments, discussions, quizzes and external tool activations exists for each course. These numbers could then be contrasted from year 2 vs. year 1 by looking at the enrollment term dim.
I'm looking for a "data extract date" or "data as of date" so that we know when the data was loaded
|The best we can say is that the date associated with the latest dump is the data extract date.|
Do we have Entity Relationship Diagram on Canvas Data?
|We do not have an ER diagram. However, The documentation describes the foreign key relationships|
What data is in the course_ui_navigation tables?
|These tables represent the navigation settings that have been chosen by instructors for different courses.|
What exactly does pseudonym mean in Canvas?
|In Canvas, users can have one or more logins. The table with information about logins is called pseudonyms in the underlying Canvas database.|
Can I obtain login information from every user?
|From the pseudonym_fact, and its corresponding pseudonym_dim, tables you will be able to obtain login information for every user.|
Do the assignment tables include assignments, quizzes, and graded discussions?
Is any kind of record kept of the communication that occurs in conversations?
|This would be within conversations. The associated tables to use are conversation_message_participant_fact, conversation_dim and conversation_message_dim|