missing information in assignment_dim file

Jump to solution
ddaza
Community Novice

hello Canvas Data community, we are struggling with this issue:

We are downloading the CanvasData files, in the last days we noticed that there were missing assignments in the #assignment_dim file, we discovered this while checking the #assignment_group_score_fact file. 

What could be the reason of this behavior?

How could we be sure that, our CanvasData files are consistent and have been downloaded without errors? 

Is there a MD5 option to verify CanvasData files integrity?

Please, if you need information about this, please let me know.

1 Solution
robotcars
Community Champion

I often suspect this is caused by the batch process itself and usually is resolved with the next dump set. However, if this is happening for a few days that doesn't make as much sense.

Here's where I'd start troubleshooting.

Comparing assignment_dim and assignment_fact, how many rows do you have?

Looking at the gzip files, do they appear to be approximately the same size? I've noticed that when the batch crashes, the gzip collection for the table has file sizes all over the place... we see this when we have duplicate rows.

If you have more than 1 file for each table collection, are the files balanced, with 1M rows each?

Alternatively, if you have all the rows in the gzip files, make sure that they are all being imported correctly. Sometimes imports don't work when we don't setup the columns correctly for length, data types, UTF-8 including unicode and emoji characters. Depending on the import utility you're using and settings, you may have a partial import when commits fail on duplicate rows. If you're relying on ENUM constraints, you may have new rows with something you're not accounting for, and those rows are being rejected.

If you test all this, and it's consistently happening, you may want to check with canvasdatahelp@instructure.com, they are extremely fast and helpful.

View solution in original post