Initial Canvas Data Sync Questions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the CanvasDataCLI installed and running. Thank you to Deactivated user and @James for all the help late last night!
I was wondering what to expect if I run the initial command for sync? Disk Space requirements? Does it download files for every day, or just the latest for everything except for requests? I don't want to break the server or fill it up!!!
Thanks!
Joni
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@millerjm ,
To clarify what Chris wrote a little -- each day, it only appends the data for the requests table. The other files are completely rewritten with each new dump that you get.
We're not huge and only been using Canvas since FA2012, We're sitting at about 8.75 GB compressed and 54.41 GB uncompressed (just over 6 times the compressed size). If you use the CLI to uncompress and merge things together, you need both of those available, plus space to handle the database's copy of things.
If you take the approach I mentioned about not combining them into a single file first, you can save some space. I extract one file at a time, import it into the database, and then remove the extracted version, keeping only the gzipped one around. It's more work to do that way, but saves space. Still, there's a reason why they call it "Big Data" and so having to worry about hard drive space is probably not a good way to start out.
By the way, our requests tables are 8.06 GB out of that 8.75 GB and 47.53 GB out of the 54.41 GB when uncompressed. Besides the requests table, we're only pulling about 702 MB a day for complete set of other files. That number will only go up because they are complete dumps and more data is being added. Our requests tables average out to about 48.3 MB each. After that initial load, we're pulling down about 750 MB a day.
Finally, you don't have to extract all of the data and put it into a database. You could just use the data you need right now and add more as you advance. I did testing with small amounts because the extraction and combining took way too long that I decided to wait until I had everything ready to go.