Community Help

lars_vemund_sol · ‎05-25-2018

Hi, trying to find something manually in the Request-files but i'm having trouble understanding the logic.

We have somewhat of 300 Request-files in our Canvas Data portal and I can't figure out the logic in the names of the files.

Is there a logic there so i can find the correct file to download??

(I'm trying to figure out who enrolled a student in a course (I have the Timestamp)).

Kindly Lars

sam_mcknight · ‎06-15-2018

While I can't speak authoritatively, I do not believe there is logic in the naming structure, at least nothing that a person can interpret; it looks like a GUID or a hash value. Instead of finding the specific file packed file, I would recommend using the CLI tool to download and unpack the request files. That will create one consolidated txt file which you can load into any database and query the data from there.

I say that fully realizing that the process is easier to write about than to accomplish. The CLI tool though is really easy to use. The final unpacked requests.txt file will be huge. And if the enrollment happened more than a few weeks ago, you'll have to wait for the monthly full data dump rather than the incremental dump that happens daily with the requests table.

View solution in original post

sam_mcknight · ‎06-15-2018

While I can't speak authoritatively, I do not believe there is logic in the naming structure, at least nothing that a person can interpret; it looks like a GUID or a hash value. Instead of finding the specific file packed file, I would recommend using the CLI tool to download and unpack the request files. That will create one consolidated txt file which you can load into any database and query the data from there.

I say that fully realizing that the process is easier to write about than to accomplish. The CLI tool though is really easy to use. The final unpacked requests.txt file will be huge. And if the enrollment happened more than a few weeks ago, you'll have to wait for the monthly full data dump rather than the incremental dump that happens daily with the requests table.

a1222252 · ‎06-15-2018

Hi Sam,

I agree with your comments regarding the naming of requests*.gz files. Another point to note is that the names of files in the daily dumps, (available using canvasDataCli grab functionality), is structurally different from the names from the canvasDataCli sync functionality, so they are not compatible.

Having said that, a sync operation will always provide a full historical set of requests*.gz files. Unfortunately the historical dumps cause these files to be repackaged and renamed, so the full set needs to be downloaded again.

We maintain a full set of data locally daily and separate the incremental requests*.gz files to unpack based on timestamp. This works well until a problem causes the daily dumps to contain incomplete data. Filling these gaps requires a full reload, which as you say is very onerous.

We have been running for 18 months and have just been through a full reload due to some data problems which occurred in February. The unpacked requests.txt file was a little under 500GB and took 32 hours to upload to an Oracle database using Oracle Data Integrator. Clearly this is not sustainable in the long term.

I have suggested on several occasions that historical dumps be restricted to specific timeframes, (e.g. the last month). There should be no reason to go back further than that because the data doesn't change.

The alternative is to package the data with meaningful file names, (e.g. include a date range reference), so that only the relevant files can be separated and unpacked.

I shall be very interested to hear of any progress you make on this.

Regards,

Stuart.

lars_vemund_sol · ‎06-19-2018

Thanks both of you for confirming my suspicions.

I went down the road with the CLI tool but ended up in at situation where we actually did not have any good solution to read the files. This meaning we had to build a new database or something of that kind..

This was not a life or death-situation. More so a proof of concept.. So i think i'll leave it here, but i'm keeping the question in the back of my mind to see if we can come up with at better way to accomplish this.

a1222252 · ‎06-19-2018

Hi Lars,

If you're thinking of building a new database you may wish to have a look at the Canvas Data Loader tool which apparently downloads data and uploads directly into Postgres or Mysql databases. I haven't looked at it because it doesn't support Oracle, we use the canvasDataCli tool. The trick will still be to implement on a platform with adequate resources to handle the volume of data.

https://community.canvaslms.com/docs/DOC-11943-how-to-use-the-canvas-data-cli-tool

S.

jeff_longland · ‎06-19-2018

Canvas Data Loader sounds great and it ran well when I tried it initially, but there’s something amiss lately. I spent a lot of time trying to get it working earlier this year and there are some significant performance problems. It doesn’t appear to be specific to my attempt as someone else also reported the same problem: https://github.com/instructure/canvas-data-loader/issues/6

I haven’t revisited it recently, so maybe it’s worth a shot?

Jeff

robotcars · ‎07-17-2019

Hi @jeff_longland

Came across this thread looking for something else.

Not sure if you've seen this post yet, Managing Canvas Data with Embulk

Definitely improves performance of importing CD.

Logic of Request filenames i Canvas Data?

Error running initdb with DAP 1.1

CD1 to CD2 schema mapping document.

Is there a way to translate bash script to Azure w...

CD2 and Course Audit information

Canvas Data 2 Missing Some Learning Outcomes

DAP parquet file transformation strategy

pysqlsync - no rows to upsert/insert - tsv2py not ...

CD2 - Weblogs

Incremental query with missing pseudonym entries

Error running initdb with DAP 1.1

You're signed out

Logic of Request filenames i Canvas Data?

Community Help

View our top guides and resources: