Canvas Data 2 is coming 😍

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

Edina_Tipter
Instructure Alumni
Instructure Alumni
12
16243

Blog Headers -- 2000 x 200 (4).png

We are very excited to share that Canvas Data 2 has evolved from infancy into adolescence. We started last year in June with inputs and feedback from 13 Alpha customers from two regions and have grown to over 100 Beta customers around the world. The maturity will continue to accelerate as Canvas Data 2 is made publicly available in March. At full maturity, Canvas Data 2 will evolve to become a modern data access platform that will provide developers with efficient and flexible access to data from various Instructure products in bulk with high fidelity and low latency. 

Progress

 

During the Alpha and Beta phases, we have focused on the following:

  • Listening to and incorporating customer feedback (the wish list is not exhaustive, so further changes and improvements are ongoing). We have great experts and collaborators in the Beta group who help us understand customers’ perspective, pain points, and the data journey “over the hedge” to build the right tool in the right way. KristinL_0-1675037008570.png
  • Platform improvement, stability, and scalability. When we started the Alpha phase we only had a walking skeleton and a goal of achieving an MVP (minimum viable product) by the Beta phase. We at Instructure see this first platform piece as a cornerstone that we can build on more in the future, so it’s important that we make it robust and reliable. Because robustness and scalability are paramount, most of our energy goes into improving, stabilising, and scaling all components of the CD2 by the General Availability (GA) date.
  • Performance optimisation. As our customer base has grown and data volumes increase, we have learned where we need to optimise.
  • Testing. We've built a testing framework that includes an expanding unit testing suite and daily end-to-end testing and monitoring according to industry-standard practices. This framework will aid us in monitoring the pipeline's functionality and ensure that we won't disrupt working features when we release updates.
  • Sustainability, monitoring and alerting: Software issues occur from time to time. When they do, we want to make sure you feel confident that we’re addressing them. That is why we have focused on having more than just basic monitoring and alerting for important KPIs. 
  • Documentation: During the past few months, we worked to expand documentation by providing examples, complementary documentation, videos, and much more for a quick start both for institutions who will be using CD2 as your first big data solution with Instructure as well as those institutions migrating from CD1. As such, you might want to view our OpenAPI specification and the links referenced inside. All documentation including videos will be shared in the Community space once we release CD2 to production. 
  • Integrations: Between the Alpha and Beta phases, CD2 integrated with the Identity Management Service and API Gateway as part of our platform vision for a more streamlined customer/developer experience. More specifically, all the API calls now go through an API gateway service while the authentication/authorisation is managed through our brand new identity management service. Integrating with the Identity Management Service and API Gateway opens up new possibilities for a unified API look and feel. In the short term, it also provides improved API key management and secure API key sharing between an institution and its partners or providers.

Release Timeline and CD1 Sunset Plans

 

Canvas Data 2 will be released no later than the end of March. In terms of datasets, this initial release will contain this set of Canvas data tables. We are aware that many customers leverage Apache weblogs (requests tables) and Catalog data. Those datasets will be added in the second quarter of the year. Nevertheless, we encourage all users to start planning their transition as soon as Canvas Data 2 becomes available. Because the CD1 and CD2 pipelines are not compatible, consuming table deltas require changes in the ETL given that the schema has also changed. For more details please see the CD1-CD2 comparison below.

To assist customers with the transition, we are planning to provide a reference solution for downloading and importing data into a database. Furthermore, a data mapping sheet is being prepared to explain the CD1 to CD2 schema differences if you need to remap existing reports and dashboards. Both of these are a work in progress and we are hoping to release them by the end of March.

Given the release of our new data pipeline, the target date for sunsetting CD1 is the end of 2023. By this date we are expecting all CD1 customers to have transitioned to Canvas Data 2 to benefit from the new feature set and fresher data.

* Customers may opt to use Instructure Professional Services to perform data warehousing services for a cost. Those customers who have purchased the Hosted Data Services will have their data warehouse transitioned automatically.  Additional migration support for queries, integrations, or consulting can be purchased for an additional fee.

Onboarding

 

Customer onboarding requires loading an institution’s data into the data lake so that it can be consumed via the CD2 API or CLI. Onboarding for CD2 will happen in a phased manner:

  • On the GA date we will start onboarding customers actively using CD1. Those users will be notified by their CSM as soon as their institution has been added. From that time onwards they will be able to query their institution’s data.
  • For those who haven’t leveraged CD1 but want to work with CD2, we will define a separate workflow for how to request access. This is a work in progress on our end and I will share the process in my next blog post.

What is CD2 (for those who haven’t already heard..)

 

The Canvas Data 2 offering is a service that enables institutions to download their raw data across various Instructure products. It is a revamp and expansion of our “Canvas Data'' offering. The purpose of this offering is to allow institutions’ IT & data teams to retrieve LMS data in bulk and keep it up to date (≤ 4 hours data freshness). Data can be used to conduct research and build custom reports, dashboards and tools to meet the unique needs of the institution. It allows access to high-fidelity source data and is more granular than the existing Canvas Data 1 star schema. It is also worth noting that Canvas Data 2 as a product doesn’t provide users with custom data request tooling. In other words, there is no reporting engine on top of the data to produce custom data extracts. Canvas Data 2 has a defined relational schema as opposed to the Canvas Data 1 star schema which dictates what is available in each file. 

API Usage Workflow

  1. Create API key via the Identity service to access the CD2 API
  2. Request JWT access token to authenticate and get access to your root account’s data
  3. Trigger your first snapshot
  4. Chain it to your next incremental query 
  5. Continue chaining incremental queries to get the latest changes

KristinL_1-1675037008583.png

 

High Level Comparison between CD1 and CD2

Features

Canvas Data 

Canvas Data 2

Latency (data freshness)

24 – 48 hours

≤ 4 hours

Table snapshot

Table deltas (incremental query) Includes deleted records

X

CLI

API

UI downloads

X

Schema

Star schema

Relational schema

Schema versioning

Available in all regions

Canvas LMS data

65 dimensions

90 unique datasets

Multiple file format 

tsv flat files

json ✓

csv ✓

tsv ✓

 parquet ✓

Features/data not included in the initial release (GA) but which we are considering for future releases in 2023

Weblogs aka requests

Target Q2

Catalog data

Target Q2

New quizzes

 

TBC

Mobile data

 

TBC

Pageviews

 

TBC

 

Let our platform journey begin.

KristinL_2-1675037008701.png

 

 

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

12 Comments