Further updates on Canvas Data 2

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

Edina_Tipter
Instructure Alumni
Instructure Alumni
9
7009

As promised, in this blog post I will share some further news on our progress with Canvas Data 2. In my previous post I mentioned why we needed to re-architecture our data pipeline and what things we are working on. This time, I would like to give you insights into the progress with the product, an overview of the features, what will and what will not be part of the initial release, and a high-level overview of the timeline. In the end of the blog post, I would like to ask you to please fill in a survey because we want to learn your needs to make CD2 not just a better version of CD1 but an amazing user experience that surpasses your expectations.

CD2 Progres

Q4 is not over yet, and so far we are on track and make progress according to our plans that brings us closer to the product release. We've gathered feedback from our tester groups as well as internal and external architectural experts. Past the initial release, this feedback will ensure we have built a foundation that will scale to all current and future use cases.We want it to stand the test of time and be a reliable, stable and scalable data platform to our customers.

From a technological perspective, we are switching from full database snapshots to structured streaming. With incremental updates, you will be able to get the most recent changes to the data, providing data freshness of within approximately 4 hours, as opposed to more than 24 hours with CD1. Backed by open table formats (Apache Hudi/Delta Lake) and a data catalog (Hive Metastore), CD2 will offer a more streamlined approach to import data into ETL pipelines or data warehouses.

CD2 Initial Features

As a start, we would like to release this minimum feature set for you to leverage and begin using in your accounts. I have also added a few items that will be included in the next versions to give you some highlights of our future endeavours.

Part of initial release

Table snapshots

Captures the most recent state of the account table data which includes all records captured in Canvas LMS for that dataset from the moment the account was established to the moment data was requested. 

Table deltas/updates

Captures the most recent changes (deltas) to the specified tables since the provided time (not more than 60* days).

Example:

Edina_Tipter_0-1636532041214.png

 

Latency ≤ 4 hours

Big improvement compared to CD1 where it was over 24 hours.

New schema

Canvas Data 2 introduces a new data schema that is closely aligned with Canvas API schema.

Schema versioning

The Canvas Data 2 public schema will be versioned; any updates (additions and deletions) will create a new version. 

File format

Most likely CSV* format as a start but other possible options could be JSON and Parquet.

Coming in future releases (not in priority order)

Weblogs aka requests

Live events

Catalog data

New quizzes

Outcomes data

* subject to change based on research

CD2 Timeline

In terms of release dates, our tentative timeline aligns with the following release process: because we have a handful of early access customers, we would like to transition them first to the new solution in the 1st half of 2022 while preparing an open beta to everyone else. These tasks will require some more thorough preparations in terms of documentation and release notes, which will be our second milestone. 

Based on feedback we receive from the early access customers, we will solve any issues found and have a confidence vote for our general availability date, which is expected to happen in the second half of the year. By then, we will have an onboarding plan for customers, a constantly monitored data pipeline, regularly running tests, and a trained support team—all directly after our very first production release. These are really exciting times for us and we hope that you are equally excited.

Please Share Your Feedback

As always, our number-one priority is our customers, so we have a great opportunity for you to fill in this survey and contribute with your answers—help us build the right thing for your data-related use cases. It’s been a while since the original requirements were defined, so we want to see if we have to tweak anything to better serve your needs and gather confidence around your expectations regarding our Canvas Data 2 solution. 

The survey can be completed within 7 minutes and it will be open for 14 days, from the 10th to 24th of November. Click here to start the CD2 survey.

Just a reminder that the survey will be closed tomorrow, so there is one more day to fill it in 😉

The survey was CLOSED. A big THANK YOU to all contributors.🙌

The content in this blog is over six months old, and the comments are closed. For the most recent product updates and discussions, you're encouraged to explore newer posts from Instructure's Product Managers.

9 Comments
brian_mullins
Community Participant

@Edina_Tipter Thanks for the update. I think we are all excited to see the progress.

I'd like to suggest a series of blog posts that detail some of the items presented here. For example, I'd love to hear a deeper explanation on the 4-hour latency, specifically detailing how and if that ensures the same latency across all entities.


Edina_Tipter
Instructure Alumni
Instructure Alumni

Hi @brian_mullins , Thank you for the suggestion. Until my next blog post, since you had a specific question, I am trying to answer it.

We cannot guarantee primary/foreign key matching at all times across all entities because of the streaming nature of the solution. But that is why we can provide updates with low latency, that you can leverage to get the most recent updates. So say if you wanted to join two tables and had a mismatch in one of the data packages that issue will most probably be solved with the next query for the same entities. I hope this answers your question.

dave_perry
Community Explorer

This is a useful update, I only this morning mentioned to my boss that the sooner we get it the sooner we can test it (with a view to our MIS team taking over integration, running the queries against it in PowerBI).

I notice there is no mention of Canvas Studio data on the timeline. Is that the case?

Edina_Tipter
Instructure Alumni
Instructure Alumni

@dave_perry Happy you liked the update.
Regarding your question: we are planning to add Studio as well but it is on our longer term roadmap, after we have rolled out the ones you see listed. My goal with this post was to surface those data sources that have higher priority based on the survey results and customer interviews. 

mvanmatre
Community Participant

@Edina_Tipter 
I think this relates - our developer is trying to figure out the HOW:

Reading documentation online about Canvas Data Services and the Canvas Data Portal, I am aware that there is a way to download and sync Canvas data for view in outside tools, such as SQL Server Management Studio with the Canvas CLI tool. However, I have not found where to do so. Can someone point me in the right direction?

Edina_Tipter
Instructure Alumni
Instructure Alumni

Hi @mvanmatre , Apologies for not responding earlier but I just noticed your question. You might have solved this by now but if not, here are some useful videos that can help.

https://www.youtube.com/watch?v=SVqxNJiLxZ8&t=161s

https://www.youtube.com/watch?v=nLtsnp8rGtg&t=19s

https://www.youtube.com/watch?v=udkJM3B1-Jo

https://www.youtube.com/watch?v=vXhwEtSZ8AE

https://www.youtube.com/watch?v=7RW_TIm4Q3g

IsaacOdeh
Community Member

Hi all,

Not sure if anyone has tried this. I am trying to build an  ETL pipeline using Azure Data Factory. This will interact with canvas data api to extract all the assignment entities and dump the tables in an azure datalake gen 2 container.

I have tried checking the API calls using postman and everything seems to work. I was able to generate an access token using my institution API key. However, I don't seems to get it working in Azure data factory when building the pipeline.

Has anyone in this community tried? Has anyone tried extracting canvas data via api using ADF? Please, share your experience

 

Thanks,

Isaac

dgarciat
Community Member

@Edina_Tipter can you share the equivalent link to this one, but for CD2? https://portal.inshosteddata.com/docs

stimme
Community Coach
Community Coach

@dgarciat CD2's equivalent of CD1's schema docs are at https://api-gateway.instructure.com/doc/