Canvas Data 2 is getting released 🎊🤘

Edina_Tipter · ‎03-09-2023

Product Blog Headers -- 1200 x 200.png

The day is fast approaching when customers can opt in to Canvas Data 2. It’s been a major development effort and finally, the big day is the 18th of March. YEEEY. 🥳 Read on to learn more about this exciting news and what it means for your institution.

The Onboarding Process and Timing

Onboarding is the process of loading your institution’s data into a data lake to be accessed via the Canvas Data 2 API and CLI. Instructure will be onboarding customers in phases throughout the next few months. Timing will depend on customers’ adoption of Canvas Data 1 (CD1).

For active Canvas Data 1 users

Onboarding has already begun for some active CD1 customers so they will be able to start querying their Canvas LMS data on March 18th. Onboarded customers will be notified by their CSMs when they can query the data via CD2. Onboarding will continue until all active CD1 institutional data is loaded into the CD2 data lake, which we expect to complete by the end of May (subject to change).**

For those who are not using Canvas Data 1

Users who have not leveraged CD1 but who want to work with CD2 can request access through their CSM, which will take approximately two weeks.

Important: Canvas Data 2 is not available in beta/test instances. Therefore, data added to the data lake will come from the Production environment.

Transitioning: How to Transition from Canvas Data 1 to Canvas Data 2

Transitioning is the process of modifying your ETL as well as reporting and analytic applications to access CD2 instead of CD1. CD2 has been built from the ground up on more scalable and performant technologies; this means that there are some changes required to extract, transform and load (ETL) data and adjust data models. Before you start the implementation, identify the in-house resources and competencies needed for the transition and allocate time for the project in order to achieve a successful transition.

We offer the following guidance:

Consult the documentation which will be published soon in the Community under the Canvas Admin guides.
Try the product and review the API workflow and usage on how to chain queries to get consistent updates.
Study the new data schema and pick the tables you need before you redesign your ETL.
As the data schema has significantly changed, review CD1 to CD2 schema mapping sheet to understand how to remap existing reports more efficiently
Update the existing scripts and data models that support your analytic applications (See this blog post for a high-level CD1-CD2 comparison).

Reference implementation in CLI

To save you time and ease the transition effort, we offer a set of bonus features in the CLI to insert the data into a Postgres database. You may use it as is or adjust it to your choice of data store. The tool will serve three new commands:

init_db (to populate an empty (PostgreSQL) database from a snapshot),
sync_db (to insert new, delete, and update existing records returned by the incremental query)
drop_db (delete the table from the database which was previously created with init_db).

CLI commands for loading data to a Postgres DB

dap initdb --help

Example: dap initdb --db-connection-string postgresql://postgres:postgres@localhost/postgres --namespace canvas --table accounts

dap syncdb --help

Example: dap syncdb --db-connection-string postgresql://postgres:postgres@localhost/postgres --namespace canvas --table accounts

dap dropdb --help

Example: dap dropdb --db-connection-string postgresql://postgres:postgres@localhost/postgres --namespace canvas --table accounts

This tool requests the data in JSON format, and uses the prepared statement of INSERT … ON CONFLICT … DO UPDATE SET to insert (or on conflict, update) the data in the database.

The tool will initially only support PostgreSQL. Internally, it depends on Python libraries SQLAlchemy (which provides an object-relational mapping) and asyncpg (the PostgreSQL database client library).

For customers who need consultancy during their transition—or simply prefer warehousing services performed for them—they can leverage Instructure Hosted Data Services for a cost. Those customers who have purchased the Hosted Data Services will have their data warehouse transitioned automatically. Additional migration support for queries, integrations, or consulting can be purchased.

Go Live!

Once your ETL is fully functional, it is recommended to run CD1 and CD2 in parallel for a short period to verify that they return the expected results. Instructure is committed to supporting CD1 until the end of the year to ensure that customers have the time they need to effectively transition to this faster and more powerful data access platform.

Let our data platform journey begin. R-r-r-r-ready s-s-steady go! 🤸

* Access to CD2 will not automatically trigger removing access to Canvas Data 1 Portal.

** If you need to be onboarded before the end of May, please contact your CSM to add your name to the “priority lane”. Average onboarding time should not exceed two weeks, but this can vary based on the number of incoming requests.