The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December. Read our blog post for more info about this change.
hello Canvas Data community, we are struggling with this issue:
We are downloading the CanvasData files, in the last days we noticed that there were missing assignments in the #assignment_dim file, we discovered this while checking the #assignment_group_score_fact file.
What could be the reason of this behavior?
How could we be sure that, our CanvasData files are consistent and have been downloaded without errors?
Is there a MD5 option to verify CanvasData files integrity?
Please, if you need information about this, please let me know.
Solved! Go to Solution.
I often suspect this is caused by the batch process itself and usually is resolved with the next dump set. However, if this is happening for a few days that doesn't make as much sense.
Here's where I'd start troubleshooting.
Comparing assignment_dim and assignment_fact, how many rows do you have?
Looking at the gzip files, do they appear to be approximately the same size? I've noticed that when the batch crashes, the gzip collection for the table has file sizes all over the place... we see this when we have duplicate rows.
If you have more than 1 file for each table collection, are the files balanced, with 1M rows each?
Alternatively, if you have all the rows in the gzip files, make sure that they are all being imported correctly. Sometimes imports don't work when we don't setup the columns correctly for length, data types, UTF-8 including unicode and emoji characters. Depending on the import utility you're using and settings, you may have a partial import when commits fail on duplicate rows. If you're relying on ENUM constraints, you may have new rows with something you're not accounting for, and those rows are being rejected.
If you test all this, and it's consistently happening, you may want to check with canvasdatahelp@instructure.com, they are extremely fast and helpful.
I often suspect this is caused by the batch process itself and usually is resolved with the next dump set. However, if this is happening for a few days that doesn't make as much sense.
Here's where I'd start troubleshooting.
Comparing assignment_dim and assignment_fact, how many rows do you have?
Looking at the gzip files, do they appear to be approximately the same size? I've noticed that when the batch crashes, the gzip collection for the table has file sizes all over the place... we see this when we have duplicate rows.
If you have more than 1 file for each table collection, are the files balanced, with 1M rows each?
Alternatively, if you have all the rows in the gzip files, make sure that they are all being imported correctly. Sometimes imports don't work when we don't setup the columns correctly for length, data types, UTF-8 including unicode and emoji characters. Depending on the import utility you're using and settings, you may have a partial import when commits fail on duplicate rows. If you're relying on ENUM constraints, you may have new rows with something you're not accounting for, and those rows are being rejected.
If you test all this, and it's consistently happening, you may want to check with canvasdatahelp@instructure.com, they are extremely fast and helpful.
Thanks a lot carroll-ccsd, I will ask my teammates how perform the CanvasData download to double check if there is any issues with the files, I'd also opened a email-ticket with CanvasData Help requesting guidance in this issue and something related to a development that adds extra information to our Gradebook. I'll come back with news,
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.