As promised, in this blog post I will share some further news on our progress with Canvas Data 2. In my previous post I mentioned why we needed to re-architecture our data pipeline and what things we are working on. This time, I would like to give you insights into the progress with the product, an overview of the features, what will and what will not be part of the initial release, and a high-level overview of the timeline. In the end of the blog post, I would like to ask you to please fill in a survey because we want to learn your needs to make CD2 not just a better version of CD1 but an amazing user experience that surpasses your expectations.
Q4 is not over yet, and so far we are on track and make progress according to our plans that brings us closer to the product release. We've gathered feedback from our tester groups as well as internal and external architectural experts. Past the initial release, this feedback will ensure we have built a foundation that will scale to all current and future use cases.We want it to stand the test of time and be a reliable, stable and scalable data platform to our customers.
From a technological perspective, we are switching from full database snapshots to structured streaming. With incremental updates, you will be able to get the most recent changes to the data, providing data freshness of within approximately 4 hours, as opposed to more than 24 hours with CD1. Backed by open table formats (Apache Hudi/Delta Lake) and a data catalog (Hive Metastore), CD2 will offer a more streamlined approach to import data into ETL pipelines or data warehouses.
CD2 Initial Features
As a start, we would like to release this minimum feature set for you to leverage and begin using in your accounts. I have also added a few items that will be included in the next versions to give you some highlights of our future endeavours.
Part of initial release
Captures the most recent state of the account table data which includes all records captured in Canvas LMS for that dataset from the moment the account was established to the moment data was requested.
Captures the most recent changes (deltas) to the specified tables since the provided time (not more than 60* days).
Latency ≤ 4 hours
Big improvement compared to CD1 where it was over 24 hours.
Canvas Data 2 introduces a new data schema that is closely aligned with Canvas API schema.
The Canvas Data 2 public schema will be versioned; any updates (additions and deletions) will create a new version.
Most likely CSV* format as a start but other possible options could be JSON and Parquet.
Coming in future releases (not in priority order)
Weblogs aka requests
* subject to change based on research
In terms of release dates, our tentative timeline aligns with the following release process: because we have a handful of early access customers, we would like to transition them first to the new solution in the 1st half of 2022 while preparing an open beta to everyone else. These tasks will require some more thorough preparations in terms of documentation and release notes, which will be our second milestone.
Based on feedback we receive from the early access customers, we will solve any issues found and have a confidence vote for our general availability date, which is expected to happen in the second half of the year. By then, we will have an onboarding plan for customers, a constantly monitored data pipeline, regularly running tests, and a trained support team—all directly after our very first production release. These are really exciting times for us and we hope that you are equally excited.
Please Share Your Feedback
As always, our number-one priority is our customers, so we have a great opportunity for you to fill in this survey and contribute with your answers—help us build the right thing for your data-related use cases. It’s been a while since the original requirements were defined, so we want to see if we have to tweak anything to better serve your needs and gather confidence around your expectations regarding our Canvas Data 2 solution.