How to Use the Canvas Data CLI Tool

Document created by Sydney McCann Employee on Apr 6, 2016Last modified by Sydney McCann Employee on May 18, 2017
Version 16Show Document
  • View in full screen mode

    Official Canvas Document

Canvas + Logo transparent (WHITE)- 300px.png

 


Overview

 

A small Command Line Interface (CLI) tool for syncing data from the Canvas Data API.

NOTE: This is currently in beta, but is supported, and welcome to contributions. Please report any bugs or issues you find!

Benefits of this tool compared to manually downloading files:

  • It pulls the flat files for you, so you don't have to manually download all the tables (using the sync command)

  • It automatically adds the correct headers (using the unpack command after successfully running the sync command)

  • It merges files for you so you only have one fact and one dim table per companion table (using the unpack command after successfully running the sync command)

  • Allows you to pull specific tables, not the whole schema (using the fetch command)

  • Allows you to pull just one dump, not all of them (using the grab/list command)

 

Install

 

All of this needs to be done through your terminal (OSX) or command prompt (Windows)

A video tutorial of this can be found here: [Windows] How to Install the Canvas Data CLI Tool

 

Prerequisites

This tool should work on Linux, OSX, and Windows. The tool uses node.js runtime, which you will need to install before being able to use it. 1. Install Node.js - Any version newer than 0.12.0 should work, best bet is to follow the instructions here

 

Install via npm (preferred)

npm install -g canvas-data-cli

If it fails, check that this is installed with npm -v

 

Configure

 

The Canvas Data CLI requires a configuration file with certain fields set. Canvas Data CLI uses a small javascript file as the configuration file. To generate this configuration:

1. Run

Run canvasDataCli sampleConfig which will print out the sample configuration on your terminal.

 

If you are unable to run this command, please try running: 

npm uninstall -g canvas-data-cli && npm cache clean && npm install -g canvas-data-cli@0.5.4

 

2. Save file

Save this to a file with a .js extension (e.g. config.js)

 

3. Edit 'Save Location'

Within the file, edit the saveLocation and unpackLocation to point to where you want to save the Canvas Data output files.

Example #1: saveLocation: '/Users/PandaUser/Desktop/dataFiles'

Example #2: saveLocation:'/Users/PandaUser/Documents/Canvas_Data_Ex/dataFiles'

 

4. Generate API Credentials

Click on the Canvas Data API Guide for reference on generating API credentials. Once you have this you must do one of the following:

 

A. Hard Coding Credentials (easier, but less secure)

    1. Open your config.js file from step 2

    2. Remove process.env.CD_API_Secret and process.env.CD_API_Key
    3. Replace with the secret and key you generated from your Canvas Data instance surrounded by double quotes.

End result should appear like this:

module.exports = {  
  saveLocation: '/Users/PandaUser/Desktop/canvas_data/dataFiles', 
  unpackLocation: '/Users/PandaUser/Desktop/canvas_data/unpackedFiles',
  apiUrl: 'https://api.inshosteddata.com/api', 
  key: ''<your_canvas_data_key>'' , 
  secret: ''<your_canvas_data_secret>'', 
}

 

B. Store Credentials In Environmental Variables (more secure)

    1. OSX

      1. In the same terminal window, or a new terminal tab, enter in nano ~/.bash_profile
      2. Type in export CD_API_KEY='<your_canvas_data_key>'

      3. Enter

      4. Type in export CD_API_SECRET='<your_canvas_data_secret>'

        1. Some computers may require single or double quotes around the key and secret
      5. Control + o (as in otter)

      6. Enter
      7. Control+x
      8. Restart terminal

    2. WINDOWS

      1. Guide to setting environment variables or you can view a video tutorial on accomplishing this here: [Windows] How to Configure Environment Variables for Canvas Data CLI Tool
      2. Path will be CD_API_KEY and CD_API_SECRET.
      3. Key will be the associated Canvas Data Key and Secret values to those paths.

 

You can also view a video tutorial here on how to accomplish this step: How to configure the Canvas Data CLI tool and

 

Use the CLI Tool

 

The CLI tool has three built in commands:

  • Sync
  • Fetch
  • Unpack

 

Sync

 

If you want to simply download all the data from Canvas Data, the sync command can be used to keep your data from Canvas Data up to date if ran daily.

Video tutorial on this process can be found here: How to Sync Canvas Data with the Canvas Data CLI Tool and [Windows] How to Schedule an Automated Download with the Canvas Data CLI Tool

 

canvasDataCli sync -c path/to/config.js

Example: canvasDataCli sync -c ~/Desktop/config.js

Example: canvasDataCli sync -c /Users/PandaUser/Desktop/config.js

  

This will start the sync process. On the first sync, it will look through all the data exports and download only the latest version of any tables that are not marked as partial. It will also download any files from older exports to complete a partial table.

 

On subsequent executions it will:

  1. Check for newest data exports after the last recorded export
  2. Delete any old tables if the table is NOT a partial table
  3. Append new files for partial tables.

 

Fetch

 

Fetches most up-to-date data for a single table from the API. This ignores any previously downloaded files and will re-download all the files associated with that table.

canvasDataCli fetch -c path/to/config.js -t user_dim

 

Example: canvasDataCli fetch -c ~/Desktop/config.js

Example: canvasDataCli fetch -c /Users/PandaUser/Desktop/config.js 

 

This will start the fetch process and download what is needed to get the most recent data for that table (in this case, the user_dim).

 

On subsequent executions, this will re-download all the data for that table, ignoring any previous day's data.

 

Unpack

 

NOTE: This only works after properly running a sync command

This command will unpack the gzipped files, concat any partitioned files, and add a header to the output file

canvasDataCli unpack -c path/to/config.js -f user_dim account_dim

Example: canvasDataCli unpack -c ~/Desktop/config.js -f user_dim

Example: canvasDataCli unpack -c /Users/PandaUser/Desktop/config.js -f submission_dim course_dim

 

This command will unpack the user_dim and account_dim tables to a directory.

 

Currently, you explicitly have to give the files you want to unpack as this has the potential for creating very large files.

 

List

 

This command will list all data dumps that are available to be downloaded. Using the grab command after finding the ID of the data dump that you want to use is the use case for this endpoint.

canvasDataCli list -c path/to/config.js 

 

Example: canvasDataCli unpack -c ~/Desktop/config.js

Example: canvasDataCli unpack -c /Users/PandaUser/Desktop/config.js 

 

Grab

 

This command will download a data dump based on the dump id provided. A directory consisting of the same name as the dump id will be created within the the path specified in the config.js file for dataFiles. The unpack command can then be utilized to uncompress the specified tables into the unpackedFiles directory.

canvasDataCli grab -c path/to/config.js -d id_number_of_data_dump

 

Example: canvasDataCli grab -c ~/Desktop/config.js -d 123492138498123

Example: canvasDataCli grab -c /Users/PandaUser/Desktop/config.js -d 0912342397412

10 people found this helpful

Attachments

    Outcomes