Community Help

ddaza · ‎07-19-2019

hello Canvas Data community, we are struggling with this issue:

We are downloading the CanvasData files, in the last days we noticed that there were missing assignments in the #assignment_dim file, we discovered this while checking the #assignment_group_score_fact file.

What could be the reason of this behavior?

How could we be sure that, our CanvasData files are consistent and have been downloaded without errors?

Is there a MD5 option to verify CanvasData files integrity?

Please, if you need information about this, please let me know.

robotcars · ‎07-19-2019

I often suspect this is caused by the batch process itself and usually is resolved with the next dump set. However, if this is happening for a few days that doesn't make as much sense.

Here's where I'd start troubleshooting.

Comparing assignment_dim and assignment_fact, how many rows do you have?

Looking at the gzip files, do they appear to be approximately the same size? I've noticed that when the batch crashes, the gzip collection for the table has file sizes all over the place... we see this when we have duplicate rows.

If you have more than 1 file for each table collection, are the files balanced, with 1M rows each?

Alternatively, if you have all the rows in the gzip files, make sure that they are all being imported correctly. Sometimes imports don't work when we don't setup the columns correctly for length, data types, UTF-8 including unicode and emoji characters. Depending on the import utility you're using and settings, you may have a partial import when commits fail on duplicate rows. If you're relying on ENUM constraints, you may have new rows with something you're not accounting for, and those rows are being rejected.

If you test all this, and it's consistently happening, you may want to check with canvasdatahelp@instructure.com, they are extremely fast and helpful.

View solution in original post

robotcars · ‎07-19-2019

I often suspect this is caused by the batch process itself and usually is resolved with the next dump set. However, if this is happening for a few days that doesn't make as much sense.

Here's where I'd start troubleshooting.

Comparing assignment_dim and assignment_fact, how many rows do you have?

Looking at the gzip files, do they appear to be approximately the same size? I've noticed that when the batch crashes, the gzip collection for the table has file sizes all over the place... we see this when we have duplicate rows.

If you have more than 1 file for each table collection, are the files balanced, with 1M rows each?

Alternatively, if you have all the rows in the gzip files, make sure that they are all being imported correctly. Sometimes imports don't work when we don't setup the columns correctly for length, data types, UTF-8 including unicode and emoji characters. Depending on the import utility you're using and settings, you may have a partial import when commits fail on duplicate rows. If you're relying on ENUM constraints, you may have new rows with something you're not accounting for, and those rows are being rejected.

If you test all this, and it's consistently happening, you may want to check with canvasdatahelp@instructure.com, they are extremely fast and helpful.

ddaza · ‎07-19-2019

Thanks a lot carroll-ccsd‌, I will ask my teammates how perform the CanvasData download to double check if there is any issues with the files, I'd also opened a email-ticket with CanvasData Help requesting guidance in this issue and something related to a development that adds extra information to our Gradebook. I'll come back with news,

missing information in assignment_dim file

AWS Harvard Data 1 extract conversion to Data 2

Error running initdb with DAP 1.1

CD1 to CD2 schema mapping document.

Is there a way to translate bash script to Azure w...

CD2 and Course Audit information

AWS Harvard Data 1 extract conversion to Data 2

DAP parquet file transformation strategy

pysqlsync - no rows to upsert/insert - tsv2py not ...

CD2 - Weblogs

Incremental query with missing pseudonym entries

You're signed out

missing information in assignment_dim file

Community Help

View our top guides and resources: