New Columns in the Assignments Table

sgergely
Instructure
Instructure
10
1274

Canvas.png

We are excited to announce updates to the assignments table in the DAP Query API dataset, designed to enhance your data analysis capabilities. The following new columns will be added:

  • parent_assignment_id: This column may contain foreign key references to other assignments. Existing rows created before December 9, 2024, will have NULL values unless a snapshot is taken after this date.
  • type: This is a varchar field that will always have a value on our side. Initially, it will contain “Assignment,” but it may also support “SubAssignment” in the future. Please note, there will be no NOT NULL constraint or default value in your database schema; rows created before December 9, 2024, will show NULL unless a snapshot is made.
  • has_sub_assignments: This is a boolean field that may indicate whether an assignment contains sub-assignments. While our schema will ensure it always has a value, it will not have a NOT NULL constraint or default value in your schema. For rows created before December 9, 2024, this field will be NULL unless updated by a snapshot.

Important Notes for Customers

  • If you have not performed a snapshot, these new fields will appear as NULL for rows created prior to mid-December 2024.
  • Your database schema will not enforce constraints like NOT NULL or defaults for these fields, allowing you to customize usage according to your needs.

These changes will go live on December 9th, 2024. Please review your queries and data pipelines to ensure they are prepared to handle these new fields effectively.

10 Comments
ldavenport4015
Community Participant

@sgergely I've passed this information to our ITS and they came back with the following question: 

Do they recommend us just adding the fields to our local database manually (i.e. an ALTER TABLE statement to add each new field), or do they recommend we drop and rebuild the whole table/database from scratch using the provided schema APIs in order to incorporate the new fields?

Also, can we expect this documentation to be updated with new fields?

sgergely
Instructure
Instructure
Author

@ldavenport4015 thank you for the questions!

It is hard to recommend a solution without knowing your architecture or integration setup.

If you are using DAP CLI then it will create the automatically the ALTER TABLE statement and takes care of the update. Dropping the table and requesting a snapshot could also work but it might be costly for you to download and insert the whole table again.

So all-in-all but both of the suggested approach would work but it depends on your setup.

The documentation is going to be updated once the data is available through DAP Query API. The documentation always shows the current live, production state of the API.

How do you like the new documentation anyways?

ldavenport4015
Community Participant

@sgergely Thanks for the quick reply! 

It's in the hands of ITS and they are working through the best way to implement the update to the table.

As to your question on the documentation - It's great! Our LMS Manager created our original documentation to check our CD2 warehouse data using the Canvas API, converting JSON to a spreadsheet for each table. This would have saved a ton of time! And I love the Entity Relationship Diagram  - I call it the "world according to Canvas". Thank you for your investment in documentation.

sgergely
Instructure
Instructure
Author
mclark19
Community Participant

@sgergely Not sure if this has been an issue with other institutions, but we aren't seeing "parent_assignment_id" in our data, though we are seeing the other two new columns ("type" and "has_sub_assignments"). We're using the JSON file format and have our own process (not DAP client). The key "parent_assignment_id" does not occur anywhere in the data file....

ldavenport4015
Community Participant

@mclark19 We successfully updated the database yesterday, but it took a couple of tries. We do have all three of the new columns in our assignments table.  The first run was a mess - missing a lot of data from all the tables and only one of the new columns on that table came in. I asked out IT folks if they had any advice for you, and they mentioned (just like @sgergely ) that it is very dependent on your process. But here is their reply: 

The only advice I can give, without looking at their solution, is to basically do what we did. If possible, run the process in steps. So first run the part that just reads the schema (creating an empty database). Then run the part that actually loads the data. If they can run their process but just focused on creating the affected table that would be even better. It really is hard to give advice on something like this without seeing exactly what they are doing though


Hope that helps!

mclark19
Community Participant

Thanks @ldavenport4015 . I've put in a ticket with Instructure. I pulled some data this morning and noticed that we had all three columns if I request TSV, but not if I request JSON, which is what our process uses. Not sure if it is just a timing issue (things getting cached in DAP as the change is getting rolled out) or some other issue.  We're pushing things to Redshift which is not ideal for dealing with things like schema changes. If we can get the data issue sorted out, I'm fairly confident we can get the rest of the process working. 

stimme
Community Coach
Community Coach

@mclark19 I suspect that columns with all null values are omitted from the objects in the JSON files. (For example, I found that the JSON for canvas.roles for my institution does not have deleted_at key/value pairs, and that seems to be because that column is null for every row.) Right now, parent_assignment_id should be null for all assignments (the parent assignment relationship seems to exist for graded discussions with checkpoints, which is scheduled for release next month). I would expect that key to appear once it has non-null values.

mclark19
Community Participant

Thanks @stimme. Yeah, that makes sense. We just managed a successful run. It turns out that the culprit in our case was the column ordering. Long story short, we use Redshift, so running an alter table command appends columns by default. We programmatically created a jsonpaths file based on the new schema, but that didn't account for the ordering mismatch between the schema specified by Instructure due to that appending behavior. I mistakenly attributed the failing job to the "missing" column in the data rather than the column ordering issue in the jsonpaths file, which is what Redshift uses to match the data with the columns. Thanks to all who weighed in.

sgergely
Instructure
Instructure
Author

Hey @mclark19@stimme, yes, JSONL skip values if they have value NULL. That is the reason you don't see it it in the jsonl format and see it in the TSV, CSV, Parquet formats. I have realized this should be added to our documentation, which I will update ASAP.

The columns are going to be used by Discussion Checkpoints which is not yet released as it is communicated here, this is why the columns are empty yet.