Hey Mike,
Per the data portal in our Canvas environment:
For all file downloads, please note that the dates specified do not reflect the actual dates of the data, but instead when the data was finished exporting. The most recent data in a given export is generally 24-36 hours older than the date given. An exception is that the request table is 48 hours behind due to the large amount of data that needs to be processed. All dates are in UTC.
The documentation states the following for workflow_state in course_dim, it should be a good indicator:
Workflow status indicating the current state of the course, valid values are: completed (course has been hard concluded), created (course has been created, but not published), deleted (course has been deleted), available (course is published, and not hard concluded), claimed (course has been undeleted, and is not published).
Last note I have is I either read or have been told that the files provided are "best attempt" to provide the data and should be used for large aggregation, not targetting certain things so this could also be a contributing factor.
Thanks,
Ben