Alan Curry

SIS Import diffing mode with overlapping data sets

Discussion created by Alan Curry on May 4, 2018

We have a PowerSchool sync process using daily uploads in diffing mode. This was set up by Instructure using an oauth plugin on our PowerSchool server. I don't have any access to the CSVs or the code that creates them (which a continual source of frustration).


There is a second stream of SIS data that we want to import: summer school. I'm now creating a set of CSVs with summer data and sending them to the SIS Import API with a "summer2018" diffing_data_set_identifier.


The 2 data sets each do their own thing, and removals are handled by the diffing mode. There is a potential problem, though, where the 2 data sets overlap.


An example where something bad happened: Student X is in the main data set, enrolled in a regular high school with a normal schedule. Her account was added to Canvas as part of our original sync a few years ago. She signs up for a summer school course, and her information gets uploaded as part of the "summer2018" diffing data set. She cancels the summer school enrollment. In the next upload of the "summer2018" diffing data set, Student X is no longer in users.csv. Canvas deletes the user.


So I wonder: Is this the intended behavior? Before this happened, I wouldn't have guessed that a user who is active in 2 data sets would be deleted when they become inactive in only 1 of those data sets.


It appears that this assumption was too optimistic. So what can I do to work around it? I don't want to give the users different SIS IDs in the summer data set. They would end up with 2 different Canvas accounts and that's awful.


I could use the skip_deletes flag on the SIS Import upload, and do the users separately so the rest of the CSVs can still have deletes processed normally. That's a hack, though. Skipping all deletes is a poor approximation to the correct behavior, which is to skip deletes of users that are active in the main data set.


Is there an option I missed, or are overlapping data sets just hopeless?


One crazy desperate option is to trim the summer users.csv down to just the users that aren't in the main data set. But that's hard to do without access to the main data set's CSVs.