Correct-ish. It will download all the gzipped files, and on rerun, will download any new gzipped files it doesn't have. I actually don't like this command for our daily needs. I use the fetch command to get a completely new copy of the data everyday. The sync command has an issue, and I honestly don't understand how others tolerate it. If you use the sync command, any records previously downloaded will not get updated, only records that are new, so you essentially have the original state of many records, not it's current state. For my daily import I rely on the fetch command to redownload and import all 100 tables. I incrementally load the requests table daily using the fetch command as well, and delete the previous sequences before hand. Fetch for requests table downloads a short window into the past. Some details on these steps here, Canvas Data CLI - What am I missing?
Embulk is just the data loader, and I have documented some useful tips an tricks (in the repo) as well as the known schema for 4 databases. The repo will help you import each set of tables with a command. My actual workflow is a bash script that goes through a text file and downloads each desired table, then another bash script that loads each table into a staging database with a considerable amount of fault tolerance and logging built in. I then use the staged data to load to production with a slightly different purpose. Automation is finished with a scheduled cron job.
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.