Skip navigation
All Places > Q & A > Canvas Data Services > Blog > 2020 > July
2020

Canvas Data Product Evaluation

Canvas Data History 

Canvas Data is one of the Instructure data products built to provide Canvas customers with LMS transactional and web server log data. 

  • The original star schema concept delivers partially denormalized schema for Canvas and Catalog transactional data.
  • The star schema supports 50 facts and 65 dimensions, and the data latency ranges from 24 to 48 hours based on the type of dataset.
  • Canvas Data mostly provides keys to data dumps for snapshot data—it does not support data filtering or data updates.

Introduction to Canvas Data 2 

We are compelled to introduce our next generation of Canvas Data product. The product encompasses years of continuous customer feedback and data research, including a number of cutting edge data technologies and a rich LMS selection of ecosystem datasets. 

 

The mission of Canvas Data 2 is to enable Canvas customers to easily find, filter, and understand the variety of Canvas data in a timely manner.

  • Canvas Data 2 is not an analytics or reporting tool, but it is built to share high fidelity source data to power schools' analytics and custom reporting initiatives.
  • Canvas Data 2 provides access to low-latency transactional and operational data, collected across various educational products and optimized for bulk transfer. 

Data is referenced as datasets and provides more granular data than the Canvas Data star schema.

 

Canvas Data Comparison Table

* features not included in the initial beta release but will most likely be rolled out approximately 6 months afterward 

Features

Canvas Data 

Canvas Data 2

Latency

24 – 48 hours

4 hours

Table snapshot

Table deltas/updates

CLI

API

UI downloads

Schema available in API Documentation page 

Star Schema

Beta Schema

Schema versioning

Canvas LMS data

65 dimensions

89 unique datasets

Weblogs aka requests

Catalog data

*

New Quizzes

*

Outcomes data

*

File format

tsv

json

 csv

 parquet *

 

Canvas Data 2 Overview

Expected Behavior

In Canvas Data 2, the following behaviors can be expected:

  1. Possible duplicate records mainly in the update/delta files
  2. Varied data latency—the SLA stays within ≤ 4 hours
  3. Historical request data omission — customers are advised to pull all historical requests from Canvas Data 

Authentication and Authorization (authn&authz)

The Canvas LMS supports the Canvas Data 2 authn & authz mechanism, which means customers can use Canvas access tokens to access the Canvas Data 2 API and command line interface (CLI).

API and CLI

While we are planning on supporting both the API and CLI, use of the CLI is strongly recommended, as the CLI allows customers to quickly and efficiently filter data at the sub-command level prior to downloading it, this helps to avoid complex API logic.

 

Unique Datasets 

Canvas Data 2 unique datasets will answer the majority of the needs our customers voiced during the community survey conducted by our product management . Here are some of them :

 

  • Modules
  • New Quizzes *
  • Account: roles, account users
  • Rubrics
  • Outcomes
  • Originality Reports [ Plagiarism related data]
  • Conversations
  • Attachments
  • Master Courses
  • Wikis
  • Developer Keys
  • Calendar
  • Catalog *
  • Faculty Journal 
  • User Access Tokens metadata only  [ for developer level tokens]
  • LTI tools data
  • User Asset Access [ user logon data]

 

 *  features not included in the initial beta release but will most likely be rolled out approximately 6 months afterward 

Schema Versioning and Documentation 

Canvas Data 2 documentation will be hosted in the Instructure API Documentation page: https://canvas.instructure.com/doc/api/.

 

The Canvas Data 2 public schema will be versioned; any updates (additions and deletions) will create a new version. Customers will also be able to view the beta version of Canvas Data 2 schema, which allows customers to view new changes prior to the changes being released to Canvas production. This behavior is not to be confused with accessing Canvas Data 2 directly in the beta environment, which will not be supported.

 

Operational Data: Weblogs aka Request Table

Canvas Data 2 will still be offering access to weblogs dataset with latency ≤ 4 hours. Granular data filtering (e.g by request_id, request timestamp, user_id) prior to download has been considered as a highly desired feature and is currently undergoing additional research. 

Transactional Data: Updates aka Deltas

Canvas Data 2 will provide access to all changes occurring on a specific dataset within a default or custom timeline. A user will be able to provide the starting point date and time as a custom parameter. The Updates file will contain a log of transactions, each containing  metadata.orderId and metadata.status. The metadata.orderId is a lexicographically sortable ULID that represents the order of change in the source database. A user could leverage the record metadata.orderId to request all changes that happened since the record was updated. Updates files will only be available in Canvas Data 2 for 60 calendar days .  

Schema Changes

Canvas Data 2 introduces a new data schema that is closely aligned with Canvas API schema. The following additional schema details will be introduced in the new product: 

  1. Nullable indicator
  2. Source data type [postgres as a source db] 
  3. Hive data type
  4. Foreign key
  5. Possible Restricted Values for workflow_state, context_type , etc fields

JSON Data fields

Some data fields, such as student quiz responses, are stored in yaml data type fields in the source database. These fields will be released as json formatted fields.

 

 

Canvas Data 2 Release Timelines 

 

August 2020: Canvas Data 2 early preview [access to sandbox data] 

Users in all regions can use Canvas Data 2 tools [CLI only] to learn the new schema and cli commands. Access to customer specific data will not be available.

 

Q3/Q4 2020: Canvas Data 2 public beta [access to production data]

Users in all regions can use Canvas Data 2 tools (API and CLI) to explore their own data.

 

Specific dates for both the early preview and the public beta will be provided when available.

 

Canvas Data Deprecation

As soon as Canvas Data 2 is released to beta and our team reviews feedback, we will announce the deprecation of Canvas Data. We are allocating six months for our customers to migrate from Canvas Data to Canvas Data 2 prior to us turning the old solution off. An official announcement will be made in advance to inform customers of the deprecation timeline. 

Note: Canvas Data services hosted through Instructure Professional Services will be updated prior to Canvas Data end of life.

Canvas Data to Canvas Data 2 Migration Plan

We anticipate the majority of our Canvas Data customers will create a plan for migrating Canvas Data ahead of time, which will depend on the complexity of the customer’s custom data warehouse and analytics implementation.

Major Migration Differences

Customer attention will be required, as the two version of Canvas Data include the following major differences:

  1. New API routes
  2. New CLI tool 
  3. Authentication and Authorization mechanism
  4. Schema—the Canvas Data  star schema will be removed, while the Canvas Data 2 schema will be closely aligned with Canvas API schema
  5. Global object identifiers—Canvas Data 2 will not support global object identifiers but will provide data points used in Canvas Data  to construct global object IDs

Migration Options

Customer migrations could include the following options:

  1.  Write a full integration with Canvas Data 2 while maintaining an active Canvas Data  integration
  2.  Store Canvas Data 2 tables in the same data warehouse without creating a schema naming conflict. Canvas Data 2 tables have different naming conventions (e.g Canvas Data  submissions_dim and Canvas Data 2 submissions)
  3.  Leverage Canvas Data 2 in all new reports and dashboards
  4.  Update existing reports with new Canvas Data 2 tables. Both Canvas Data and Canvas Data 2 contain the canvas_id of the object. Using both schemas to support a single view, report, or dashboard could be challenging because of differences in data latency
  5.  Introduce Canvas Data 2 Weblogs as soon as new routes are in place to pull data. The weblogs schema will remain unchanged. Note: weblogs most likely will be available from the moment Canvas Data 2 is enabled in the customer environment. However, all historical requests should be pulled from Canvas Data prior to switching to Canvas Data 2

Migration Documentation

The following documentation will be made available to customers to assist with migration:

  1. Canvas Data 2 Schema
  2. Canvas Data  to Canvas Data 2 schema map
  3. Canvas Data 2 API and CLI documentation

Migration Questions

We know more questions may exist about the Canvas Data 2 migration. Questions may be asked using the Comments section of this blog post.


To request the schema for Canvas Data 2 and provide feedback about potential migration needs, please reach out to us via the Canvas Data 2 Request Form.