cancel
Showing results for 
Search instead for 
Did you mean: 

Canvas Data 2 : Features and Timelines

oxana
Community Participant
13 47 15.9K

[2020-12-21 Update: For the latest updates, please visit the Canvas Roadmap]

[2020-09-18 Update: Canvas Data Early Access information can be found in the Canvas Data 2 Early Access information page]

Canvas Data Product Evaluation

Canvas Data History 

Canvas Data is one of the Instructure data products built to provide Canvas customers with LMS transactional and web server log data. 

  • The original star schema concept delivers partially denormalized schema for Canvas and Catalog transactional data.
  • The star schema supports 50 facts and 65 dimensions, and the data latency ranges from 24 to 48 hours based on the type of dataset.
  • Canvas Data mostly provides keys to data dumps for snapshot data—it does not support data filtering or data updates.

Introduction to Canvas Data 2 

We are compelled to introduce our next generation of Canvas Data product. The product encompasses years of continuous customer feedback and data research, including a number of cutting edge data technologies and a rich LMS selection of ecosystem datasets. 

 

The mission of Canvas Data 2 is to enable Canvas customers to easily find, filter, and understand the variety of Canvas data in a timely manner.

  • Canvas Data 2 is not an analytics or reporting tool, but it is built to share high fidelity source data to power schools' analytics and custom reporting initiatives.
  • Canvas Data 2 provides access to low-latency transactional and operational data, collected across various educational products and optimized for bulk transfer. 

 

Data is referenced as datasets and provides more granular data than the Canvas Data star schema.

 

Canvas Data Comparison Table

* features not included in the initial beta release but will most likely be rolled out approximately 6 months afterward 

Features

Canvas Data 

Canvas Data 2

Latency

24 – 48 hours

4 hours

Table snapshot

Table deltas/updates

 

CLI

API

UI downloads

 

Schema available in API Documentation page 

 

Star Schema

 

Beta Schema

 

Schema versioning

Canvas LMS data

65 dimensions

89 unique datasets

Weblogs aka requests

Catalog data

*

New Quizzes

 

*

Outcomes data

 

*

File format

tsv

json

 csv

 parquet *

 

Canvas Data 2 Overview

Expected Behavior

In Canvas Data 2, the following behaviors can be expected:

  1. Possible duplicate records mainly in the update/delta files
  2. Varied data latency—the SLA stays within ≤ 4 hours
  3. Historical request data omission — customers are advised to pull all historical requests from Canvas Data 

Authentication and Authorization (authn&authz)

The Canvas LMS supports the Canvas Data 2 authn & authz mechanism, which means customers can use Canvas access tokens to access the Canvas Data 2 API and command line interface (CLI).

API and CLI

While we are planning on supporting both the API and CLI, use of the CLI is strongly recommended, as the CLI allows customers to quickly and efficiently filter data at the sub-command level prior to downloading it, this helps to avoid complex API logic.

 

Unique Datasets 

Canvas Data 2 unique datasets will answer the majority of the needs our customers voiced during the community survey conducted by our product management . Here are some of them :

 

  • Modules
  • New Quizzes *
  • Account: roles, account users
  • Rubrics
  • Outcomes
  • Originality Reports [ Plagiarism related data]
  • Conversations
  • Attachments
  • Master Courses
  • Wikis
  • Developer Keys
  • Calendar
  • Catalog *
  • Faculty Journal 
  • User Access Tokens metadata only  [ for developer level tokens]
  • LTI tools data
  • User Asset Access [ user logon data]

 

 *  features not included in the initial beta release but will most likely be rolled out approximately 6 months afterward 

Schema Versioning and Documentation 

Canvas Data 2 documentation will be hosted in the Instructure API Documentation page: https://canvas.instructure.com/doc/api/.

 

The Canvas Data 2 public schema will be versioned; any updates (additions and deletions) will create a new version. Customers will also be able to view the beta version of Canvas Data 2 schema, which allows customers to view new changes prior to the changes being released to Canvas production. This behavior is not to be confused with accessing Canvas Data 2 directly in the beta environment, which will not be supported.

 

Operational Data: Weblogs aka Request Table

Canvas Data 2 will still be offering access to weblogs dataset with latency ≤ 4 hours. Granular data filtering (e.g by request_id, request timestamp, user_id) prior to download has been considered as a highly desired feature and is currently undergoing additional research. 

Transactional Data: Updates aka Deltas

Canvas Data 2 will provide access to all changes occurring on a specific dataset within a default or custom timeline. A user will be able to provide the starting point date and time as a custom parameter. The Updates file will contain a log of transactions, each containing  metadata.orderId and metadata.status. The metadata.orderId is a lexicographically sortable ULID that represents the order of change in the source database. A user could leverage the record metadata.orderId to request all changes that happened since the record was updated. Updates files will only be available in Canvas Data 2 for 60 calendar days .  

Schema Changes

Canvas Data 2 introduces a new data schema that is closely aligned with Canvas API schema. The following additional schema details will be introduced in the new product: 

  1. Nullable indicator
  2. Source data type [postgres as a source db] 
  3. Hive data type
  4. Foreign key
  5. Possible Restricted Values for workflow_state, context_type , etc fields

JSON Data fields

Some data fields, such as student quiz responses, are stored in yaml data type fields in the source database. These fields will be released as json formatted fields.

 

 

Canvas Data 2 Release Timelines 

 

September 2020: Canvas Data 2 early preview [access to sandbox data] 

Users in all regions can use Canvas Data 2 tools [CLI only] to learn the new schema and cli commands. Access to customer specific data will not be available.

 

Q3/Q4 2020: Canvas Data 2 public beta [access to production data]

Users in all regions can use Canvas Data 2 tools (API and CLI) to explore their own data.

 

Specific dates for both the early preview and the public beta will be provided when available.

 

Canvas Data Deprecation

As soon as Canvas Data 2 is released to beta and our team reviews feedback, we will announce the deprecation of Canvas Data. We are allocating six months for our customers to migrate from Canvas Data to Canvas Data 2 prior to us turning the old solution off. An official announcement will be made in advance to inform customers of the deprecation timeline. 

Note: Canvas Data services hosted through Instructure Professional Services will be updated prior to Canvas Data end of life.

Canvas Data to Canvas Data 2 Migration Plan

We anticipate the majority of our Canvas Data customers will create a plan for migrating Canvas Data ahead of time, which will depend on the complexity of the customer’s custom data warehouse and analytics implementation.

Major Migration Differences

Customer attention will be required, as the two version of Canvas Data include the following major differences:

  1. New API routes
  2. New CLI tool 
  3. Authentication and Authorization mechanism
  4. Schema—the Canvas Data  star schema will be removed, while the Canvas Data 2 schema will be closely aligned with Canvas API schema
  5. Global object identifiers—Canvas Data 2 will not support global object identifiers but will provide data points used in Canvas Data  to construct global object IDs

Migration Options

Customer migrations could include the following options:

  1.  Write a full integration with Canvas Data 2 while maintaining an active Canvas Data  integration
  2.  Store Canvas Data 2 tables in the same data warehouse without creating a schema naming conflict. Canvas Data 2 tables have different naming conventions (e.g Canvas Data  submissions_dim and Canvas Data 2 submissions)
  3.  Leverage Canvas Data 2 in all new reports and dashboards
  4.  Update existing reports with new Canvas Data 2 tables. Both Canvas Data and Canvas Data 2 contain the canvas_id of the object. Using both schemas to support a single view, report, or dashboard could be challenging because of differences in data latency
  5.  Introduce Canvas Data 2 Weblogs as soon as new routes are in place to pull data. The weblogs schema will remain unchanged. Note: weblogs most likely will be available from the moment Canvas Data 2 is enabled in the customer environment. However, all historical requests should be pulled from Canvas Data prior to switching to Canvas Data 2

Migration Documentation

The following documentation will be made available to customers to assist with migration:

  1. Canvas Data 2 Schema
  2. Canvas Data  to Canvas Data 2 schema map
  3. Canvas Data 2 API and CLI documentation

Migration Questions

We know more questions may exist about the Canvas Data 2 migration. Questions may be asked using the Comments section of this blog post.


To request the schema for Canvas Data 2 and provide feedback about potential migration needs, please reach out to us via the Canvas Data 2 Request Form.

47 Comments
a1222252
Community Participant

Hi Oxana,

Looks great, and well documented. I clicked on the link last week to get the schema documentation, but haven't received anything back yet.

Regards,

Stuart.

oxana
Community Participant

 @stuart_smith ‌ it looks like you requested to have an editor access to the form not the schema document, it's OK , I found your email and shared CD2 Public Beta Schema with you. Please let us know if you still are having access problems. 

a1222252
Community Participant

Hi Oxana,

Thanks for that, it does look quite different to the current data structure.

I can only see the fields for the score dataset on the Datasets_fields and I can't see how to expand the filter. Is this what I should expect?

Regards,

Stuart.

oxana
Community Participant

Please take a look at the schema document when you have a chance and let me know if you are still having problems viewing it . 

a1222252
Community Participant

That's better. Thanks Oxana, I'll have a good look at this today. Is there any documentation which maps CD1 columns to CD2?

oxana
Community Participant

Not yet, but it will be coming soon, we are planning on releasing it as soon as we have Early Access sandbox set up. 

stimme
Community Contributor

This is a very exciting update! I am looking forward to the public beta this fall. 

Some questions came up when I read that Canvas access tokens will allow us to use the CD2 API and CLI. Which permissions do we need for this? Can we specify user accounts to grant/deny access to?

Thank you!

oxana
Community Participant

 @stimme ‌ you will need to have account level permission : data services - manage . It could be scoped by sub-account . You could find it in your Canvas account permissions area, we currently use it to grant access to Live Events subscription portal. 

a1222252
Community Participant

Hi Oxana,

Would it be prudent to introduce separate credentials to restrict the ability to download Canvas data? At present anyone with admin access is able to use their credentials to download the data.

Regards,

Stuart.

oxana
Community Participant

Hi Stuart,

Actually, that was our thought : by putting access behind data services - manage permission you will be able to create a new account based role e.g data analyst role and  grant data services - manage permission to that role , you also would be able to disable this permission for account administrators so your data access will be only restricted to one data specific role. When it comes to credentials, we are using Canvas for our authentication and authorization,  a user with data services- manage permission will create a user token and provide it when calling Canvas Data API or running CLI to request data. Hope this answers your question and provides flexibility you are looking for.  

Regards,

Oxana

ColinMurtaugh
Community Champion

Hi Oxana --

This is great -- I'm looking forward to checking this out! One question: when you say that the weblog (aka "requests") data will likely contain events starting with "the moment Canvas Data 2 is enabled in the customer environment", does that mean it'll start when the Q3/Q4 Public Beta begins?

Thanks!

--Colin

oxana
Community Participant

Hi Colin,

Most likely  a customer will see requests starting from the moment their Canvas Data 2 is turned on in their Canvas production environment. Canvas Data legacy will have all historical requests and will keep collecting/offering requests data as designed till the product is deprecated.

Thank you,

Oxana

millerjm
Community Contributor

Hi Oxana,

How will requests work for redshift customers? 

Will this always have all of the requests data or will we lose all of our historic requests when we move to canvas data 2?  

I use both redshift and local database system but I don't host requests in a local database because it's too much data for our local IT resources.  

Thanks!

Joni

oxana
Community Participant

Hi Joni ,

Your redshift instance is managed by Professional Services, they are planning on changing your redshift to consume Canvas Data 2, I would reach out to them and ask for clarifications, I believe the migration should not impact your current data in the redshift.

Thanks,

Oxana 

sor1
Community Member

Please forgive some of these questions - I am very new to Canvas.

- I can't find some unique identifier for user other than name.  For example, is email address available? and/or is this (personal) data intentionally excluded from Canvas Data?

- Will Canvas Data 2 be introduced on the beta site before release  ? (And which API url would be used to access the data? The beta site shows the same api key and secret as production)

oxana
Community Participant

  Q: I can't find some unique identifier for user other than name.  For example, is email address available? and/or is this (personal) data intentionally excluded from Canvas Data?

A: Canvas Data legacy : pseudonyms_dim.unique_name ; Canvas Data 2 : pseudonyms.unique_id  [ The unique login id for the user. This is what the user uses to log in to Canvas ] 

Q:  Will Canvas Data 2 be introduced on the beta site before release  ? (And which API url would be used to access the data? The beta site shows the same api key and secret as production)

A: Canvas Data 2 will not be offered in Canvas beta environment , however  we will be rolling out early preview of the product in Instructure sandbox in August , you will be able to use it as your test tool . Canvas Data 2 schema preview in  beta environment will be supported.  API routes to request data from Canvas Data 2 will be posted on Instructure Canvas API site when they are available. 

Jeff_F
Community Champion

Depending on the report and requirements, I've also used the following identifiers in Canvas Data:

  • user_dim.canvas_id   -- this appears to be users.id in Canvas Data 2
  • user_dim.id  (when deidentification was necessary)    --- I do not see an equivalent so far in Canvas Data 2
oxana
Community Participant

Hi Jeff,

Canvas Data legacy offers two types of identifiers : global and local . Local ID will be mapped to you Canvas Data 2 table PK , global ID will not be included unless your institution has users from other schools that don't reside on the same shard /cluster as your account , those user IDs will be globalized . In order for you to keep using global ID you could use the following conversion option : g_shard_id * 10000000000000 + users.id   . We will be including your g_shard_id in the file schema so you will be able to use it to craft your global IDs if necessary . We don't recommend relying on global IDs as your primary identifier for an object simply because those IDs are subject to change anytime we migrate your account to a new shard which we do from time to time to make sure our databases are scaling properly to your current size/usage. 

Thank you,

Oxana

nboettger
Community Member

Hi Oxana,

I completed the request for the Canvas Data 2 schema but haven't received anything yet.  Completed it again today and notified our CSM.  When might we receive the schema document?

Thanks, Nancy

millerjm
Community Contributor

@oxana 

Thank you for all of your work and for keeping us informed about the future changes!

I don't have access to the schema either.  I filled out the form some time ago.  Should I have gotten an email?  

Joni

a1222252
Community Participant

Hi Nancy, Joni,

Oxana sent me a link to the public schema details, hope you can also see it:

https://docs.google.com/spreadsheets/d/1axjhLwPY4N16SAf61-X8xj3kSg_Ey-PXvcsUTtojRoo/edit#gid=1473158...

Regards,

Stuart.

a1222252
Community Participant

Hi Oxana,

Has there been any update on Canvas Data 2 beta availability and rollout timing?

Thanks & regards,

Stuart. (stuart.smith01@adelaide.edu.au)

millerjm
Community Contributor

Thanks for the link to the schema, Stuart! 

Thanks for giving me access to it, Oxana! 

s_travieso
Community Member

Hi all!

Has there been any update on Canvas Data 2 beta availability or rollout timing?

Our university is starting with Canvas this September and I don't think it's worthy to use the Canvas Data 1, but we need to have access to the data.

Thanks!!

doneal
Community Member

Any update on when the CLI access to sandbox will be available?

ctcck
Community Member

May I have the schema of Canvas Data 2 too?

Also, I would like to know how to access sandbox with CLI? Please advise.

 

Thanks

a1222252
Community Participant

Hi @ctcck are you able to access the draft schema spreadsheet using the hyperlink a few posts up?

mgogari
Community Member

Hi @oxana,

How can we sign up for the early preview so that we can evaluate the Canvas Data 2 ecosystem.

Hoping to hear from you soon.

Thanks,

Manthan Gogari

 

oxana
Community Participant

Hi, everyone,

We have been carefully allocating resources to ensure all our existing services stay up and scale under the unprecedented pressure from the volume of requests coming to the Canvas application. Moreover, Canvas Data 2 Early Access deliverable has been progressing steadily to the final stage but had to be paused because of the resource support currently required from the Canvas Data team.

We are anticipating the availability of Canvas Data 2 Early Access [sandbox] announced in the next two weeks. Thank you for your patience!

oxana
Community Participant

Hi, everyone,

Canvas Data Early Access information can be found in the Canvas Data 2 Early Access information page. Thank you!

a1222252
Community Participant

Hi Oxana,

Not sure how this has happened, but the contents of the About the Author box at the top of this page are my details from our PeopleSoft HR system.

Regards,

Stuart.

a1222679
Community Contributor

it's good to see that the Quiz dataset has the only_visible-to_overrides field.  This was something I flagged up a couple of years ago as the lack of a visibility field in quiz_dim meant I had to manually look up whether practice quizzes with overrides also had a version that applied to everyone else.

BendingUnit22
Community Member

Just got dap setup and imported a few test snapshots using Talend. Works great and super easy. Is there a way to format the output csv file names. I was just trying to save a few steps after download. I do not need the date / time on the name. Just checking before I go digging through the lib code.

Thanks

oxana
Community Participant

@BendingUnit22 

We don’t have a way of changing file names. I’ll keep it as a feature request, thanks for the feedback!

Thanks,

Oxana

BendingUnit22
Community Member

I am looking in the delta files. i know they are changes since the time I specify for the request. Are these updates, deletes, and new records. I guess i am asking if I request the quiz_questions data, which would be everything to now, tomorrows delta data would be inserts?

If someone changed an old question text and saved would i use metadata.order_id from the delta file or the PK(id) to remove it on the main table then insert the delta data or update it?

Hope this makes sense.

Cheers

namoguy
Community Member

Clearly this post needs to be updated. We are at the end of Q4 2020 and I just got email pointing to Feb 2021 maybe.  Can anyone help?  I have spent hours over the past 2 days trying to figure out what is right and what is wrong information. My Assessment Director keeps pointing me to this page as the definitive proof Data 2 is available NOW! and I need to get it to him yesterday for our upcoming accreditation review.  So please update this section in the original post with more updated information. 

 

Q3/Q4 2020: Canvas Data 2 public beta [access to production data]

Users in all regions can use Canvas Data 2 tools (API and CLI) to explore their own data.

 

erinhmcmillan
Community Team
Community Team

@namoguy Timelines for Canvas Data 2 have been adjusting. Blogs aren't to be considered official updates. The latest updates for anything related to timelines are going to be found in the Canvas Roadmap, which does indicate February 2021.

Thanks!

Erin

kiki1
Community Member

Hi, @oxana 

I am working on a project for building dashboards and reports with canvas data.  I saw the Canvas Data 2 CLI is available.  How can I enable this in my own canvas env?  I have deployed the docker version in my own Azure machine. 

Thanks

pgo586
Community Participant

While we are not Canvas Data subscribers, we have been using Canvas Data to develop dashboards/visualizations that are part of a production LTI tool. Our code depends on third-party software (a vendor's special 'Canvas Data' plugin) that ingests Canvas Data into the vendor's platform (Splunk). Are you working at all with this vendor to ensure that they update their plugin for Canvas Data ingestion to conform to Canvas Data 2? Assuming that this is not going to happen before the Beta product release, then we are concerned that it may well take longer than 6 months for us to get everything in place (first, the vendor needs to update their plugin for data ingestion, and then we'll need to update our own code). A longer deprecation timeline for Canvas Data would definitely help us. 

nboettger
Community Member

Hi, We are really needing that mapping documentation; when will it be available?  Has anyone developed one of their own?  We're starting small, for a single dashboard, but have another project waiting in the wings to create additional dashboards and reporting.  I can't get the necessary resources until I can prove we have the mapping.  Can you help?

erinhmcmillan
Community Team
Community Team

Hi @nboettger @pgo586 

We are working on creating a user group for Canvas Data 2 users and will include all information about Canvas Data 2. Not everyone who signed up to be part of Canvas Data 2 will be included in the beta initially and are rolling it out slowly to institutions. We will ensure everyone who wants to be involved can do so and we won't deprecate anything without future notice.

Thanks!

kdoherty2
Community Member

Hi Erin,

What is the status of the existing Early Access testing?

https://community.canvaslms.com/t5/Canvas-Data-Users/Canvas-Data-2-Early-Access/ba-p/408235

Many users have reported issues connecting to the test data and there has been no follow-up for a while on that post.  Should we skip the early access and wait for further information?

Thanks.

 

 

 

pgo586
Community Participant

Thanks @erinhmcmillan for working on the creation of a user group for Canvas Data 2. Will there be a special way of joining, or will everybody that has contacted you re: the topic be included automatically in it?

ross_bell
Community Member

I have the same question as @pgo586 and would like to be included in the Canvas Data 2 user group.  Thank you.

a1222679
Community Contributor

I'd also like to be included in the user group please.

s_travieso
Community Member

Please, include me also in the grupo... Thanks!

erinhmcmillan
Community Team
Community Team

Hi, all,

Yes, please stand by for additional information about Canvas Data 2. 

We appreciate the interest in Canvas Data 2! We are no longer accepting requests to participate in the beta, as we have more interested institutions than we can currently accommodate.

Thanks,

Erin