AnsweredAssumed Answered

Canvas Data CLI -  What am I missing?

Question asked by Robert Carroll on Nov 20, 2018
Latest reply on May 20, 2019 by James Jones


Can sync by downloading and keeping every file on the local volume

   - encumbers a ridiculous amount of disk space

Can fetch individual tables

   - downloads all files for that table

   - but for requests table, downloads every requests table file that hasn't expired

Can download a specific dump by id

   - every file


There doesn't seem to be a way to

- Specify a list of tables I need to use and get all files for those tables

- Get the latest Requests files, without downloading previous increments, or every other table in the dump


Are these assumptions correct? Is there another way?


I'm coming at this a little biased. I currently use James Jones' canvas data PHP code to load our data. His API is simple and fantastic, and I can pass an array of tables and skip over everything else I don't want. We don't have the space to store every table in the database, and it's silly to store every table file on a local volume just to handle a sync operation. I'm trying to move away from PHP* on our task server, many alternatives are better suited for this type of thing. I like the ease of the CLI and the responsive error messages, but it feels incomplete. Might try James' PERL script too, just tinkering with options at the moment.


I also read through Canvas Data CLI: Incremental load of Requests table


I've been working my way around this today with a little bash scripting...

   - fetch-tables.txt is just a file with the tables I want to download, 1 per line.

   - download all files for each table

   - delete the files from the requests not from the current dump sequence

   - unpack

# robert carroll, ccsd-k12-obl

# clear old files
rm -rf "$DOWNLOAD_DIR/*"
# get the latest schema for unpacking
wget "" -O "$DOWNLOAD_DIR/schema.json"

# read table list into array
mapfile -t TABLES < fetch-tables.txt
# loop through tables array
for i in "${TABLES[@]}"
  # fetch table files
  canvasDataCli fetch -c config.js -t $i | sed "s/$/: $i/g";
  if [ "$i" == "requests" ]; then
    # get the most recent sequence id, latest dump
    sequence_id=$(canvasDataCli list -c config.js -j | python -c 'import json,sys;obj=json.load(sys.stdin);print obj[0]["sequence"]');
    # delete all request files not from the latest dump
    find "$DOWNLOAD_DIR/$i" -type f ! -name "$sequence_id-*.gz" -delete;

# unpack files
echo 'unpacking files'
UNPACK=$(IFS=' ' eval 'echo "${TABLES[*]}"');
canvasDataCli unpack -c config.js -f $UNPACK;

# eof

Was having issues with unpacking all files at the end, because the documentation shows comma separation on the tables... but needs spaces. canvasDataCli unpacking & adding headers

CLI readme also says you can only Unpack after a Sync operation... which I found is only because it needs the schema.json file, which I download on line 8 with wget.


Next I'd be using James' bash, which I currently use but swapped in MSSQL via MS/BCP;


I'd love to know how any one else is dealing with this or what suggestions you might have.