Canvas Data 2 : Features and Timelines

Instructure
Instructure
13 34 6,623

Canvas Data Product Evaluation

Canvas Data History 

Canvas Data is one of the Instructure data products built to provide Canvas customers with LMS transactional and web server log data. 

  • The original star schema concept delivers partially denormalized schema for Canvas and Catalog transactional data.
  • The star schema supports 50 facts and 65 dimensions, and the data latency ranges from 24 to 48 hours based on the type of dataset.
  • Canvas Data mostly provides keys to data dumps for snapshot data—it does not support data filtering or data updates.

Introduction to Canvas Data 2 

We are compelled to introduce our next generation of Canvas Data product. The product encompasses years of continuous customer feedback and data research, including a number of cutting edge data technologies and a rich LMS selection of ecosystem datasets. 

 

The mission of Canvas Data 2 is to enable Canvas customers to easily find, filter, and understand the variety of Canvas data in a timely manner.

  • Canvas Data 2 is not an analytics or reporting tool, but it is built to share high fidelity source data to power schools' analytics and custom reporting initiatives.
  • Canvas Data 2 provides access to low-latency transactional and operational data, collected across various educational products and optimized for bulk transfer. 

 

Data is referenced as datasets and provides more granular data than the Canvas Data star schema.

 

Canvas Data Comparison Table

* features not included in the initial beta release but will most likely be rolled out approximately 6 months afterward 

Features

Canvas Data 

Canvas Data 2

Latency

24 – 48 hours

4 hours

Table snapshot

Table deltas/updates

 

CLI

API

UI downloads

 

Schema available in API Documentation page 

 

Star Schema

 

Beta Schema

 

Schema versioning

Canvas LMS data

65 dimensions

89 unique datasets

Weblogs aka requests

Catalog data

*

New Quizzes

 

*

Outcomes data

 

*

File format

tsv

json

 csv

 parquet *

 

Canvas Data 2 Overview

Expected Behavior

In Canvas Data 2, the following behaviors can be expected:

  1. Possible duplicate records mainly in the update/delta files
  2. Varied data latency—the SLA stays within ≤ 4 hours
  3. Historical request data omission — customers are advised to pull all historical requests from Canvas Data 

Authentication and Authorization (authn&authz)

The Canvas LMS supports the Canvas Data 2 authn & authz mechanism, which means customers can use Canvas access tokens to access the Canvas Data 2 API and command line interface (CLI).

API and CLI

While we are planning on supporting both the API and CLI, use of the CLI is strongly recommended, as the CLI allows customers to quickly and efficiently filter data at the sub-command level prior to downloading it, this helps to avoid complex API logic.

 

Unique Datasets 

Canvas Data 2 unique datasets will answer the majority of the needs our customers voiced during the community survey conducted by our product management . Here are some of them :

 

  • Modules
  • New Quizzes *
  • Account: roles, account users
  • Rubrics
  • Outcomes
  • Originality Reports [ Plagiarism related data]
  • Conversations
  • Attachments
  • Master Courses
  • Wikis
  • Developer Keys
  • Calendar
  • Catalog *
  • Faculty Journal 
  • User Access Tokens metadata only  [ for developer level tokens]
  • LTI tools data
  • User Asset Access [ user logon data]

 

 *  features not included in the initial beta release but will most likely be rolled out approximately 6 months afterward 

Schema Versioning and Documentation 

Canvas Data 2 documentation will be hosted in the Instructure API Documentation page: https://canvas.instructure.com/doc/api/.

 

The Canvas Data 2 public schema will be versioned; any updates (additions and deletions) will create a new version. Customers will also be able to view the beta version of Canvas Data 2 schema, which allows customers to view new changes prior to the changes being released to Canvas production. This behavior is not to be confused with accessing Canvas Data 2 directly in the beta environment, which will not be supported.

 

Operational Data: Weblogs aka Request Table

Canvas Data 2 will still be offering access to weblogs dataset with latency ≤ 4 hours. Granular data filtering (e.g by request_id, request timestamp, user_id) prior to download has been considered as a highly desired feature and is currently undergoing additional research. 

Transactional Data: Updates aka Deltas

Canvas Data 2 will provide access to all changes occurring on a specific dataset within a default or custom timeline. A user will be able to provide the starting point date and time as a custom parameter. The Updates file will contain a log of transactions, each containing  metadata.orderId and metadata.status. The metadata.orderId is a lexicographically sortable ULID that represents the order of change in the source database. A user could leverage the record metadata.orderId to request all changes that happened since the record was updated. Updates files will only be available in Canvas Data 2 for 60 calendar days .  

Schema Changes

Canvas Data 2 introduces a new data schema that is closely aligned with Canvas API schema. The following additional schema details will be introduced in the new product: 

  1. Nullable indicator
  2. Source data type [postgres as a source db] 
  3. Hive data type
  4. Foreign key
  5. Possible Restricted Values for workflow_state, context_type , etc fields

JSON Data fields

Some data fields, such as student quiz responses, are stored in yaml data type fields in the source database. These fields will be released as json formatted fields.

 

 

Canvas Data 2 Release Timelines 

 

September 2020: Canvas Data 2 early preview [access to sandbox data] 

Users in all regions can use Canvas Data 2 tools [CLI only] to learn the new schema and cli commands. Access to customer specific data will not be available.

 

Q3/Q4 2020: Canvas Data 2 public beta [access to production data]

Users in all regions can use Canvas Data 2 tools (API and CLI) to explore their own data.

 

Specific dates for both the early preview and the public beta will be provided when available.

 

Canvas Data Deprecation

As soon as Canvas Data 2 is released to beta and our team reviews feedback, we will announce the deprecation of Canvas Data. We are allocating six months for our customers to migrate from Canvas Data to Canvas Data 2 prior to us turning the old solution off. An official announcement will be made in advance to inform customers of the deprecation timeline. 

Note: Canvas Data services hosted through Instructure Professional Services will be updated prior to Canvas Data end of life.

Canvas Data to Canvas Data 2 Migration Plan

We anticipate the majority of our Canvas Data customers will create a plan for migrating Canvas Data ahead of time, which will depend on the complexity of the customer’s custom data warehouse and analytics implementation.

Major Migration Differences

Customer attention will be required, as the two version of Canvas Data include the following major differences:

  1. New API routes
  2. New CLI tool 
  3. Authentication and Authorization mechanism
  4. Schema—the Canvas Data  star schema will be removed, while the Canvas Data 2 schema will be closely aligned with Canvas API schema
  5. Global object identifiers—Canvas Data 2 will not support global object identifiers but will provide data points used in Canvas Data  to construct global object IDs

Migration Options

Customer migrations could include the following options:

  1.  Write a full integration with Canvas Data 2 while maintaining an active Canvas Data  integration
  2.  Store Canvas Data 2 tables in the same data warehouse without creating a schema naming conflict. Canvas Data 2 tables have different naming conventions (e.g Canvas Data  submissions_dim and Canvas Data 2 submissions)
  3.  Leverage Canvas Data 2 in all new reports and dashboards
  4.  Update existing reports with new Canvas Data 2 tables. Both Canvas Data and Canvas Data 2 contain the canvas_id of the object. Using both schemas to support a single view, report, or dashboard could be challenging because of differences in data latency
  5.  Introduce Canvas Data 2 Weblogs as soon as new routes are in place to pull data. The weblogs schema will remain unchanged. Note: weblogs most likely will be available from the moment Canvas Data 2 is enabled in the customer environment. However, all historical requests should be pulled from Canvas Data prior to switching to Canvas Data 2

Migration Documentation

The following documentation will be made available to customers to assist with migration:

  1. Canvas Data 2 Schema
  2. Canvas Data  to Canvas Data 2 schema map
  3. Canvas Data 2 API and CLI documentation

Migration Questions

We know more questions may exist about the Canvas Data 2 migration. Questions may be asked using the Comments section of this blog post.


To request the schema for Canvas Data 2 and provide feedback about potential migration needs, please reach out to us via the Canvas Data 2 Request Form.

34 Comments
Surveyor II

Hi Oxana,

Looks great, and well documented. I clicked on the link last week to get the schema documentation, but haven't received anything back yet.

Regards,

Stuart.

Instructure
Instructure

stuart.smith@rams.colostate.edu‌ it looks like you requested to have an editor access to the form not the schema document, it's OK , I found your email and shared CD2 Public Beta Schema with you. Please let us know if you still are having access problems. 

Surveyor II

Hi Oxana,

Thanks for that, it does look quite different to the current data structure.

I can only see the fields for the score dataset on the Datasets_fields and I can't see how to expand the filter. Is this what I should expect?

Regards,

Stuart.

Instructure
Instructure

Please take a look at the schema document when you have a chance and let me know if you are still having problems viewing it . 

Surveyor II

That's better. Thanks Oxana, I'll have a good look at this today. Is there any documentation which maps CD1 columns to CD2?

Instructure
Instructure

Not yet, but it will be coming soon, we are planning on releasing it as soon as we have Early Access sandbox set up. 

Explorer

This is a very exciting update! I am looking forward to the public beta this fall. 

Some questions came up when I read that Canvas access tokens will allow us to use the CD2 API and CLI. Which permissions do we need for this? Can we specify user accounts to grant/deny access to?

Thank you!

Instructure
Instructure

stimme@emory.edu‌ you will need to have account level permission : data services - manage . It could be scoped by sub-account . You could find it in your Canvas account permissions area, we currently use it to grant access to Live Events subscription portal. 

Surveyor II

Hi Oxana,

Would it be prudent to introduce separate credentials to restrict the ability to download Canvas data? At present anyone with admin access is able to use their credentials to download the data.

Regards,

Stuart.

Instructure
Instructure

Hi Stuart,

Actually, that was our thought : by putting access behind data services - manage permission you will be able to create a new account based role e.g data analyst role and  grant data services - manage permission to that role , you also would be able to disable this permission for account administrators so your data access will be only restricted to one data specific role. When it comes to credentials, we are using Canvas for our authentication and authorization,  a user with data services- manage permission will create a user token and provide it when calling Canvas Data API or running CLI to request data. Hope this answers your question and provides flexibility you are looking for.  

Regards,

Oxana

Adventurer II

Hi Oxana --

This is great -- I'm looking forward to checking this out! One question: when you say that the weblog (aka "requests") data will likely contain events starting with "the moment Canvas Data 2 is enabled in the customer environment", does that mean it'll start when the Q3/Q4 Public Beta begins?

Thanks!

--Colin

Instructure
Instructure

Hi Colin,

Most likely  a customer will see requests starting from the moment their Canvas Data 2 is turned on in their Canvas production environment. Canvas Data legacy will have all historical requests and will keep collecting/offering requests data as designed till the product is deprecated.

Thank you,

Oxana

Adventurer

Hi Oxana,

How will requests work for redshift customers? 

Will this always have all of the requests data or will we lose all of our historic requests when we move to canvas data 2?  

I use both redshift and local database system but I don't host requests in a local database because it's too much data for our local IT resources.  

Thanks!

Joni

Instructure
Instructure

Hi Joni ,

Your redshift instance is managed by Professional Services, they are planning on changing your redshift to consume Canvas Data 2, I would reach out to them and ask for clarifications, I believe the migration should not impact your current data in the redshift.

Thanks,

Oxana 

Surveyor

Please forgive some of these questions - I am very new to Canvas.

- I can't find some unique identifier for user other than name.  For example, is email address available? and/or is this (personal) data intentionally excluded from Canvas Data?

- Will Canvas Data 2 be introduced on the beta site before release  ? (And which API url would be used to access the data? The beta site shows the same api key and secret as production)

Instructure
Instructure

  Q: I can't find some unique identifier for user other than name.  For example, is email address available? and/or is this (personal) data intentionally excluded from Canvas Data?

A: Canvas Data legacy : pseudonyms_dim.unique_name ; Canvas Data 2 : pseudonyms.unique_id  [ The unique login id for the user. This is what the user uses to log in to Canvas ] 

Q:  Will Canvas Data 2 be introduced on the beta site before release  ? (And which API url would be used to access the data? The beta site shows the same api key and secret as production)

A: Canvas Data 2 will not be offered in Canvas beta environment , however  we will be rolling out early preview of the product in Instructure sandbox in August , you will be able to use it as your test tool . Canvas Data 2 schema preview in  beta environment will be supported.  API routes to request data from Canvas Data 2 will be posted on Instructure Canvas API site when they are available. 

Adventurer

Depending on the report and requirements, I've also used the following identifiers in Canvas Data:

  • user_dim.canvas_id   -- this appears to be users.id in Canvas Data 2
  • user_dim.id  (when deidentification was necessary)    --- I do not see an equivalent so far in Canvas Data 2
Instructure
Instructure

Hi Jeff,

Canvas Data legacy offers two types of identifiers : global and local . Local ID will be mapped to you Canvas Data 2 table PK , global ID will not be included unless your institution has users from other schools that don't reside on the same shard /cluster as your account , those user IDs will be globalized . In order for you to keep using global ID you could use the following conversion option : g_shard_id * 10000000000000 + users.id   . We will be including your g_shard_id in the file schema so you will be able to use it to craft your global IDs if necessary . We don't recommend relying on global IDs as your primary identifier for an object simply because those IDs are subject to change anytime we migrate your account to a new shard which we do from time to time to make sure our databases are scaling properly to your current size/usage. 

Thank you,

Oxana

Surveyor

Hi Oxana,

I completed the request for the Canvas Data 2 schema but haven't received anything yet.  Completed it again today and notified our CSM.  When might we receive the schema document?

Thanks, Nancy

Adventurer

@oxana 

Thank you for all of your work and for keeping us informed about the future changes!

I don't have access to the schema either.  I filled out the form some time ago.  Should I have gotten an email?  

Joni

Surveyor II

Hi Nancy, Joni,

Oxana sent me a link to the public schema details, hope you can also see it:

https://docs.google.com/spreadsheets/d/1axjhLwPY4N16SAf61-X8xj3kSg_Ey-PXvcsUTtojRoo/edit#gid=1473158...

Regards,

Stuart.

Surveyor II

Hi Oxana,

Has there been any update on Canvas Data 2 beta availability and rollout timing?

Thanks & regards,

Stuart. (stuart.smith01@adelaide.edu.au)

Adventurer

Thanks for the link to the schema, Stuart! 

Thanks for giving me access to it, Oxana! 

Surveyor

Hi all!

Has there been any update on Canvas Data 2 beta availability or rollout timing?

Our university is starting with Canvas this September and I don't think it's worthy to use the Canvas Data 1, but we need to have access to the data.

Thanks!!

Surveyor

Any update on when the CLI access to sandbox will be available?

Surveyor

May I have the schema of Canvas Data 2 too?

Also, I would like to know how to access sandbox with CLI? Please advise.

 

Thanks

Surveyor II

Hi @ctcck are you able to access the draft schema spreadsheet using the hyperlink a few posts up?

Surveyor

Hi @oxana,

How can we sign up for the early preview so that we can evaluate the Canvas Data 2 ecosystem.

Hoping to hear from you soon.

Thanks,

Manthan Gogari

 

Instructure
Instructure

Hi, everyone,

We have been carefully allocating resources to ensure all our existing services stay up and scale under the unprecedented pressure from the volume of requests coming to the Canvas application. Moreover, Canvas Data 2 Early Access deliverable has been progressing steadily to the final stage but had to be paused because of the resource support currently required from the Canvas Data team.

We are anticipating the availability of Canvas Data 2 Early Access [sandbox] announced in the next two weeks. Thank you for your patience!

Instructure
Instructure

Hi, everyone,

Canvas Data Early Access information can be found in the Canvas Data 2 Early Access information page. Thank you!

Surveyor II

Hi Oxana,

Not sure how this has happened, but the contents of the About the Author box at the top of this page are my details from our PeopleSoft HR system.

Regards,

Stuart.

Surveyor

it's good to see that the Quiz dataset has the only_visible-to_overrides field.  This was something I flagged up a couple of years ago as the lack of a visibility field in quiz_dim meant I had to manually look up whether practice quizzes with overrides also had a version that applied to everyone else.

Surveyor

Just got dap setup and imported a few test snapshots using Talend. Works great and super easy. Is there a way to format the output csv file names. I was just trying to save a few steps after download. I do not need the date / time on the name. Just checking before I go digging through the lib code.

Thanks

Instructure
Instructure

@BendingUnit22 

We don’t have a way of changing file names. I’ll keep it as a feature request, thanks for the feedback!

Thanks,

Oxana