Canvas Data Sanity Checks? (Pre and Post Download)

lfeng1
Community Participant

Posting this question on behalf of a customer: 

When the Canvas Data files are generated, what checks might be in place to determine the completeness or integrity of the data?  For example, how would one know if the following scenarios might exist:

  • Missing files - 1 or more files not included in a dump
  • Missing data within a file - records within a file might be missing or do not represent the latest dump
  • Incomplete data - attributes about canvas objects are not present  <-- this one I'm sure is rather difficult

I know when using the Canvas Data API, that one could compare the 'numfiles' and maybe other properties such as the 'createdAt' timestamp, but I'm wondering if it would be useful to have the Canvas Data API return other measures to determine correctness?  Maybe like a checksum for each file?

At the moment, we are performing the following downstream checks on the data once loaded into the warehouse: 

  • Check tables are present with proper permissions
  • Check tables and columns are present
  • Ensure row counts in tables match record counts in Canvas Data flat files
  • Check number of accounts
  • Check number of available courses
  • Verify most recent user login
  • Verify earliest created user login
  • Verify number of users by workflow state
  • Count total number of assignments by workflow state
  • Look for timestamp range for requests table entries
  • Date range for request table events

If others have similar checks, please feel free to share here!  thanks!