Your Community is getting an upgrade!
Read about our partnership with Higher Logic and how we will build the next generation of the Instructure Community.
Found this content helpful? Log in or sign up to leave a like!
We have recently had a few of our tables fail to import the incremental data with our nightly job. When we were looking into the issue it seems to be related to the schema API endpoint which is no longer accurate for a handful of tables.
As part of our workflow we apply the schema from the API to the downloaded incremental or snapshot data in a dataframe before performing an Upsert into our tables. We adjusted our job to use some schema evolution options which took care of the failures, but I wanted to point out this finding in case anyone else has CD2 workflows that are having issues with this anomaly.
I checked the JSON coming off of the schema API, the web version (https://api-gateway.instructure.com/doc/) and the data files themselves (only for quiz_questions). Only the schema API endpoint is inconsistent. The interesting one we found is context_module_progressions as the missing column is labeled "required" in the web schema, but is not on the required list in the API, nor is present among the other fields in the API. One observation we made is that for all of the tables where we found a mismatch it was always the last column.
I attached some screenshots to show the issue. I did a pretty print of the API schema for readability.
The JSON schema returned by the DAP API schema endpoint and the layout of the output returned for a snapshot/incremental query should be consistent. Our Python client library is relying on the response returned by the schema endpoint to synchronize a local copy of the database with DAP; any inconsistencies may cause data loss. We are looking into why the schema endpoint is not consistent with the data layout. I am sure the Product Manager for CD 2 will get in touch with you shortly.
The documentation for DAP API is informative. While it should be accurate, our tooling (e.g. the client library) uses the schema returned by the DAP API schema endpoint, not the contents of the web page.
Thanks for the reply, if there is any information we can help with, please let me know.
Our tables were originally constructed via the API responses as well.
We assume the issue started on either 06/12 or 06/13, our jobs run in the early morning for the east coast of the US, the import failures we experienced was on the morning of 06/13. I realized in my original post I had forgot to put the exact date.
Since in every case we found the missing column was sequentially the last column, we assume it was probably a side affect of some other change.
I'll put a list of all of the ones we had found in case it helps them troubleshoot the issue.
To interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign InTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign In