Canvas Data 2: API draft specification available for feedback

This blog from the Instructure Product Team is no longer considered current. While the resource still provides value to the product development timeline, it is available only as a historical reference.

Edina_Tipter
Instructure Alumni
Instructure Alumni
19
4963

First, a big THANK YOU to all who have completed the CD2 survey. I was really excited to see how this number has been growing day by day, a number that shows a great interest in what we are building. The amount of responses will guide us in the product decisions to ensure Canvas Data 2 meets your needs. We are also using the results to validate some of our assumptions and anticipate future improvement needs.

 

API DRAFT specification

Taking into consideration the survey results as well as the inputs from the customer interviews, we have prepared the following draft API specification that you are invited to review. Please share your feedback in the comments section of this blog post. This version will be available for feedback until we release the next updated version of it, which I will share in another blog post.

This specification is a draft because even though the designed endpoints are close to our MVP* goals of the product, we would consider adjustments as needed depending on your input. Furthermore, we are in the process of validating the schema which WILL CHANGE to become more accurate, so please don’t start building your processes until it is final. 

Given this is a draft, I won’t be explicitly communicating the exact changes, but I will share the next updated version and I will react of course to your questions. In the API spec the list of available tables is accurate and these shall be available to you in CD2.

It is important to say as well that some features mentioned in the specification might be rolled out on an incremental basis, and for that I will take into consideration the feedback I receive(d) and will communicate that explicitly in an upcoming blog post. Considerations will only be minor things and nothing that would stop anyone from using CD2.

 

CD2 progress and timeline

In terms of the progress and timeline, I am grateful to our engineering team that, in spite of the unexpected changes and extra tasks they had to take on starting this year, I can happily share that we are still on track with our milestones for 2022: rolling out the product to early access (alpha) customers in the first half of this year, and preparing the open beta and General Availability for the second half of this year. This plan wouldn’t be possible without the steady management support and the perseverance, expertise, and hard work of the development team. 

I am sure that everybody is excited to finally touch and feel CD2, which is getting closer and closer as we speak.

I look forward to your feedback regarding our API draft.


*MVP (Minimum Viable Product) is the version of a new product that allows a team to collect the maximum amount of validated learning about customers with the least amount of effort. (Eric Ries) (Product development doesn’t stop at the MVP level, this is just our first milestone we want to achieve.)

Tags (3)

This blog from the Instructure Product Team is no longer considered current. While the resource still provides value to the product development timeline, it is available only as a historical reference.

19 Comments
claudiateresita
Community Explorer

Excellent news, thank you!

jacobse
Community Explorer

Will there be a CSV format (like previously available) for the schema so that our dba developers can easily create internal tables?

Thanks,
Ed

 

JimEgan
Community Explorer

Adding to what Ed said above, a previous release provided a spreadsheet version of the schema.  This is critical to our development effort as it brings all the columns and tables into a single document.

jacobse
Community Explorer

Hi,

With the delay in implementation of Canvas Data 2, I was curious if there is any opportunity for schools to join the Alpha group as we previously requested and were not selected previously.  Our university leadership is ready to go with getting access and implementing CD2 in our Enterprise Datawarehouse and start combined reporting with Canvas and Ellucian Banner data.

Thanks,
Ed

Edward Jacobs Jr.
Manager Application Administration and Support
Metropolitan State University of Denver

 

audra_agnelly
Community Champion

I was not aware of a survey. Is it still open? Is there a link somewhere?

dsweeney2
Community Participant

Great! Thanks Edina,

My feedback/questions are:

  1. https://data-access-platform-api.s3.eu-central-1.amazonaws.com/index.html#tag/API/paths/~1query~1{pr... Mentions that snapshots won't include hard-deleted objects. Can you give us an example of what this refers to? I'm assuming regularly deleted objects will still appear in a snapshot? For the incremental query would the behaviour be different for hard vs regular deletions? An incremental query might cover a timeline that includes an edit and a subsequent deletion and we would need to know the state of the deleted object at the time of deletion.
  2. https://data-access-platform-api.s3.eu-central-1.amazonaws.com/index.html#tag/CompleteIncrementalJob I'm confused here with the name of the API type implying that an object has been returned indicating that the job is complete, but the example has a status of "waiting". "running" and "failed" are also possible statuses, but wouldn't they be covered by Job or FailedJob? Or is this just a biproduct of the documentation formatting?
Edina_Tipter
Instructure Alumni
Instructure Alumni

@jacobse and @JimEgan 

As per the API spec, we do not plan to share the schema in CSV format but I don't want to rule out this possibility either until I don't understand the full context. So in order to be able to help, I have to know why do you need it? How will this be helpful? Assuming that you need it to create the tables in your DB, we can look into providing SQL table definitions instead. Will that help?

One example:

CREATE TABLE "Address"(
id INT GENERATED BY DEFAULT AS IDENTITY,
city TEXT NOT NULL,
PRIMARY KEY (id)
);
COMMENT ON TABLE "Address" IS 'An unambiguously identified location.';

CREATE TABLE "Person"(
id INT GENERATED BY DEFAULT AS IDENTITY,
family_name TEXT NOT NULL,
given_name TEXT NOT NULL,
birth_date TIMESTAMP NOT NULL,
perm_address_id INT NOT NULL,
temp_address_id INT,
PRIMARY KEY (id),
CONSTRAINT fk_Person_perm_address FOREIGN KEY (perm_address_id) REFERENCES "Address"(id),
CONSTRAINT fk_Person_temp_address FOREIGN KEY (temp_address_id) REFERENCES "Address"(id)
);
COMMENT ON COLUMN "Person".perm_address_id IS 'Permanent address where the person lives.';
COMMENT ON COLUMN "Person".temp_address_id IS 'Temporary address where the person resides.';

Edina_Tipter
Instructure Alumni
Instructure Alumni

@audra_agnelly Unfortunately it is closed. But I am curious to receive your feedback. Let me figure out something and get back to you. 

Edina_Tipter
Instructure Alumni
Instructure Alumni

@jacobse 

As for the participation is alfa, we can only accept a very limited number of institutions and the places for now are filled. I can imagine though that someone is dropping out so what I can suggest is please talk to you CSM and signal this request. You might also want to be put on the open beta list.

Edina_Tipter
Instructure Alumni
Instructure Alumni

@dsweeney2 

To point 1:

Let me share this. The files that you can download via CD2 will contain database table fields as well as some metadata. 

workflow_state field is used to indicate the record state in Canvas database table. When a record is soft-deleted from Canvas, the workflow_state will be set to the `deleted`; status field. This is basically equivalent to an update and you will have it in both the snapshot and the incremental query.

In case of incremental queries, we also include the ‘Action’. Only a handful of processes in Canvas permanently purge data records from database. The ‘Action’=D field indicates whether a record has been hard-deleted (purged) from database. If the record was created or updated then ‘Action’=U. Based on this, it has to be admitted that such an edge case raised by you may occur. Given this, I would be curious to learn, what kind of conclusions could you draw based on such a chain of events if we suppose that you are triggering incremental queries often enough?

To point 2:

Great point. Thank you for spotting this. Yes, it is the documentation tool that is causing the bad formatting. For a FailedJob you can only get Failed status. We will check if we can improve the documentation in these regards.  

jacobse
Community Explorer

@Edina_Tipter Thank you it would be great if the schema could be provided in a SQL create table format or if it can be given in a format that we can easily convert.  at our institution, we use Oracle databases, but I see the sample is in a different database format.  If we were given the ability to run the data through a converter to establish the correct database format for table creation, I think that would work.

Thanks,

Ed

 

adam_c_voyton
Community Participant

@Edina_Tipter

The API documentation looks great, very clean format. Looking forward to getting access to Canvas Data 2. 

Does this mean we'll be able to extract real-time data using APIs? Or would the data still need to be extracted, converted, and imported into our reporting system? It would be preferable to have both options, as our pain point with CD1 is the data we can pull and report on is more than 24 hours old. From my understanding, CD2 will have data that's only around 4 hours old. 

Said differently, would the APIs pull data from the last snapshot that occurs every few hours, or would the data it pulls be real-time data based on when the API call runs? 

Edina_Tipter
Instructure Alumni
Instructure Alumni

@adam_c_voyton 

Hi Adam,

Thank you for your feedback. The main goal of CD2 is to provide efficient access to your data to enable you to keep your DBs up to date.

It’s correct that we target less than 4 hours latency meaning that you will receive all of the data that is 4 hours or older and some that is less than 4 hours old. It is an upcoming task for the team to put the monitoring in place. Overall, because it’s not sub-minute or sub-second, so not immediate, I wouldn’t call it real time.

The CD2 solution is consuming a data stream and not data dumps, but as part of the downstream processing some transformations are applied to the data and is getting partitioned before served to you via the API. That’s why once you pulled the first snapshot, you can get the latest changes performed on a table and just apply them on your DB tables incrementally. Don’t need to reprocess all the data each time.

I hope this helps but let me know if you still have questions.

AustinJames
Community Member

Hi.  Thanks for making this available.  As a K12 institution, we heavily use the grade-passback feature to send assignment scores from Canvas to our student information system (SIS).  I was hopeful that the new API might include the ability to fetch the results of grade-passback jobs, but I don't see it.  Am I simply missing it (lti-results?) or is that simply not included?

Edina_Tipter
Instructure Alumni
Instructure Alumni

@AustinJames You are not missing it. At the moment this is not included in the dataset available via the API as it doesn't live in the Canvas DB per se. Nevertheless, I understand the need so I will put this on my list to keep it in mind as we add more data sources to CD2.

djohnson10
Community Explorer

Canvas Data 2 appears to offer more flexibility and does address the issue of having to download full datasets.  It will require more work to pull the data and transform it.  Has anyone considered the bandwidth impacts of moving away from delimitated files to JSON.  JSON is about double the size of delimited files. For large organizations this will have really impacts.

Edina_Tipter
Instructure Alumni
Instructure Alumni

Hi @djohnson10 

Thank you for your feedback. The new solution will require for sure some adjustment on your end to leverage the benefits of the new pipeline but this is a change for the better 😉

Actually we will be offering JSON but we also support delimited files such as CSV and TSV as well (and later Parquet). So you can pick what suits you best. Also, these files will be compressed in gzip so this will reduce the size as well. It's only the schema that will be available in JSON only but that is of a negligible size so I suppose you were more worried about the table downloads.

RajeshNarayanan
Community Member

Hi @Edina_Tipter ,

Thanks, this thread gives a great insight into Canvas Data 2. I work in an R1 Institution that has implemented Canvas and extensive dashboards using Canvas Data 1. At the preliminary planing stages for Canvas data 2. I do have the following questions:

1. Is there an updated site for Tables and Field Mappings from Canvas Data 1 -> Canvas Data 2. I do see some Google sheets referenced in other discussions but they are dated to early 2021.

2. Is there a timeline defined for implementation and a preferable date recommended for user Adoption?

3. Requests Table: We do use requests tables for some approximations, I did read that requests table is not supported in Canvas Data 2. If this is true, what would be the recommendation for users from Canvas for similar datasets?

Thanks

nhumphries
Community Explorer

SQL Table definitions would be most useful