[ARCHIVED] Canvas Data CLI - What am I missing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CD-CLI:
Can sync by downloading and keeping every file on the local volume
- encumbers a ridiculous amount of disk space
Can fetch individual tables
- downloads all files for that table
- but for requests table, downloads every requests table file that hasn't expired
Can download a specific dump by id
- every file
There doesn't seem to be a way to
- Specify a list of tables I need to use and get all files for those tables
- Get the latest Requests files, without downloading previous increments, or every other table in the dump
Are these assumptions correct? Is there another way?
I'm coming at this a little biased. I currently use @James ' canvas data PHP code to load our data. His API is simple and fantastic, and I can pass an array of tables and skip over everything else I don't want. We don't have the space to store every table in the database, and it's silly to store every table file on a local volume just to handle a sync operation. I'm trying to move away from PHP* on our task server, many alternatives are better suited for this type of thing. I like the ease of the CLI and the responsive error messages, but it feels incomplete. Might try James' PERL script too, just tinkering with options at the moment.
I also read through Canvas Data CLI: Incremental load of Requests table
I've been working my way around this today with a little bash scripting...
- fetch-tables.txt is just a file with the tables I want to download, 1 per line.
- download all files for each table
- delete the files from the requests not from the current dump sequence
- unpack
#!/bin/bash
# robert carroll, ccsd-k12-obl
DOWNLOAD_DIR='/canvas/data/files-downloaded'
# clear old files
rm -rf "$DOWNLOAD_DIR/*"
# get the latest schema for unpacking
wget "https://portal.inshosteddata.com/api/schema/latest" -O "$DOWNLOAD_DIR/schema.json"
# read table list into array
mapfile -t TABLES < fetch-tables.txt
# loop through tables array
for i in "${TABLES[@]}"
do
# fetch table files
canvasDataCli fetch -c config.js -t $i | sed "s/$/: $i/g";
if [ "$i" == "requests" ]; then
# get the most recent sequence id, latest dump
sequence_id=$(canvasDataCli list -c config.js -j | python -c 'import json,sys;obj=json.load(sys.stdin);print obj[0]["sequence"]');
# delete all request files not from the latest dump
find "$DOWNLOAD_DIR/$i" -type f ! -name "$sequence_id-*.gz" -delete;
fi
done
# unpack files
echo 'unpacking files'
UNPACK=$(IFS=' ' eval 'echo "${TABLES[*]}"');
canvasDataCli unpack -c config.js -f $UNPACK;
# eof
Was having issues with unpacking all files at the end, because the documentation shows comma separation on the tables... but needs spaces. canvasDataCli unpacking & adding headers
CLI readme also says you can only Unpack after a Sync operation... which I found is only because it needs the schema.json file, which I download on line 8 with wget.
Next I'd be using James' bash import.sh, which I currently use but swapped in MSSQL via MS/BCP;
I'd love to know how any one else is dealing with this or what suggestions you might have.
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.