| Question |
Answer |
|
How do I add another admin to the Canvas Data Portal?
|
How do I manage Canvas Data admin users?
|
|
Where do I find the data schema?
|
Canvas Data API Documentation
A JSON version can also be downloaded from the API with the endpoint: GET /api/schema/latest
|
|
What are 'dim' and 'fact'?
|
Canvas Data uses Kimball methodology to create a star-schema. "dim" stands for dimension and "fact" stands for fact. More info on star schemas and Kimball methodology can be found at https://en.wikipedia.org/wiki/Dimensional_modeling
Essentially, facts provide more general detail about the item. For example, you may use a fact to count enrollments. Meanwhile, the dimensions provide insight into the data about an item, meaning the dimension could get be used to show how many enrollments are "Teacher" enrollments.
|
|
Is historical data available for the requests table?
|
Yes—Historical Data is loaded the second Wednesday of every month following the activation of Canvas Data in Canvas.
Historical requests (page view) data needs to be uploaded separately after Canvas Data has been enabled. Schools that are enabled with Canvas Data will get their historical data loaded in batch sometime during the first month of activation. Once historical data has been loaded, it doesn't need to be updated again and again.
Note: If you have missed your historical load, please contact your CSM to have the data reloaded.
|
|
How far back does the historical data go?
|
With the exception of the requests table. All historical data since the beginning for the customer's subscription is included. For requests data historical data is loaded starting with 2014-03-01 or the beginning of the customer's subscription. Whichever is later.
|
|
What time are the daily files/Redshift/API available for download?
|
Typically, the dump completes around 2:00 AM, Mountain Time. This data is the same across the flat files available in the Canvas Data Portal UI, the data used for the population of Redshift, and files available via the Canvas Data API endpoints.
Note: This time is not guaranteed as many external factors may cause the load to be later in the morning.
|
|
Does each day's flat file contain data only for that day, or does it contain historical data with the new data added?
|
Excluding the requests table, each file will continue to grow as we continuously append the previous day's data to the table. We do not provide deltas for these files.
Due to the nature and potential size of the requests table, we only provide the previous day's data.
|
|
What is the data model for managing transactions?
|
There are no transactions. Flat files are a complete refresh except for requests which is append-only. Redshift is read-only.
|
|
The data exports have a ".gz" extension on them when downloaded through the UI, why?
|
The files are in GZIP format. There are several open-source and commercial tools to unpack these files for either Windows and Mac OS. A popular free tool is 7zip (https://www.7-zip.org/). We do not publish the files in any other formats.
|
|
How do I open flat files?
|
Files are tab-delimited files. These can be opened with Excel, text editor, Tableau, or any other program that can open ".txt" files. Once you open the raw .txt file, you will need to reference our schema documentation to add headers. This can be avoided by using the API to download the data into a data warehouse.
Instructure also has built an open-source command line tool, capable of adding these headers in. Use the link below for instructions on installation and usage. The user will need to download their data with the CLI, and then use the "unpack" command.
GitHub - instructure/canvas-data-cli
|
|
Why can't headers be generated for the columns in the Canvas Data CSV export?
|
The primary reason is that most of the tables have more than one file. If we put headers at the top of one of the files (or all of them), it makes it more cumbersome to use simple command line tools like cat, awk, grep, wc, etc, to manipulate the data. Some customers wanted this and some did not. The choice was made to not include them.
The Canvas Data API can be used to get a JSON-based schema that can be used to generate headers.
The CLI can also be used for those customers wanting headers in their regular flat files.
|
|
Is there a way I download all the data files at once?
|
Yes, this is what the API is meant for (Canvas Data Portal API).
A cli tool is also available to use: https://www.npmjs.com/package/canvas-data-cli |
|
Where is the API documentation?
|
Canvas Data Portal
|
|
What exactly is "Hosted Data Services"?
|
Hosted Data Service is a service that allows Instructure to manage Canvas Data for you by automating the loading of Canvas Data into an Amazon Redshift instance.
While using this service, Instructure ensures that your data stays up to date, handles any schema/process changes, and handles the management of the large data set for you. The service also allows your team to focus on querying the data. This can be done by using tools that allow for OBDC connections.
For more information and pricing, please contact your CSM.
|
|
Is there any sort of orientation discussion for Canvas Data?
|
If you need an orientation to Canvas Data, please reach out to your CSM to schedule one.
|
|
Is there consulting for Canvas Data?
|
Yes—Canvas Data consulting is available for assistance in understanding the Canvas Data schema and creating reports based on the Canvas Data schema.
For more information about Canvas Data consulting and pricing, please contact your CSM.
|
|
What data is currently not in Canvas Data?
|
The data available in Canvas data will be a fraction of the data available in the main Canvas API https://canvas.instructure.com/doc/api/all_resources.html endpoints. Rather than list all of the items not available in Canvas Data, it's best to review the Canvas Data schema to see the data that is available.
These items are not in Canvas Data and are asked about most frequently:
-
"Total Activity", as seen in the "Users" tab within a Canvas course
-
Syllabus (Calendar portion)
-
Assignment Rubrics
-
Quiz Question Answer Submissions
-
Calendar Events & Scheduler
-
ePortfolios
|
|
Some of the tables listed in the Canvas Data schema are not in my daily data dumps. Why is that?
|
We do not provide data for empty tables. If you are sure that there should be data in your missing tables, please reach out to our support staff (canvasdatahelp@instructure.com).
|
|
Why do I sometimes see duplicate files for the same date & time in my Canvas Data Portal?
|
This is a known issue in Canvas Data that will happen from time to time. One of our jobs gets a false negative health check once in a while. As a result, we start a new / duplicate job to eliminate any possibilities for missing files. It is a very rare occasion and should resolve on its own with the next run.
Note: While the rows are in a different order, the content is the same.
|