How to Use the CLI Data Tool
Overview
A small Command Line Interface (CLI) tool for syncing data from the Canvas Data API.
Benefits of this tool compared to manually downloading files:
-
It pulls the flat files for you, so you don't have to manually download all the tables (using the sync command)
-
It automatically adds the correct headers (using the unpack command after successfully running the sync command)
-
It merges files for you so you only have one fact and one dim table per companion table (using the unpack command after successfully running the sync command)
-
Allows you to pull specific tables, not the whole schema (using the fetch command)
- Allows you to pull just one dump, not all of them (using the grab/list command)
Install
All of this needs to be done through your terminal (OSX/Linux), command prompt (Windows) or through the Bash terminal in the Windows Linux Sub system.
A video tutorial of this can be found here: [Windows] How to Install the Canvas Data CLI Tool
Prerequisites
This tool should work on Linux, OSX, and Windows. The tool uses node.js runtime, which you will need to install before being able to use it. 1. Install Node.js - Any version newer than 0.12.0 should work, best bet is to follow the instructions here
Install via npm (preferred)
npm install -g canvas-data-cli
If it fails, check that this is installed with npm -v
Configure
The Canvas Data CLI requires a configuration file with certain fields set. Canvas Data CLI uses a small javascript file as the configuration file. To generate this configuration:
1. Run
Run canvasDataCli sampleConfig which will print out the sample configuration on your terminal.
If you are unable to run this command, please try running:
npm uninstall -g canvas-data-cli && npm cache clean && npm install -g canvas-data-cli@0.5.4
2. Save file
Save this to a file with a .js extension (e.g. config.js)
3. Edit 'Save Location'
Within the file, edit the saveLocation and unpackLocation to point to where you want to save the Canvas Data output files.
Example #1: saveLocation: '/Users/PandaUser/Desktop/dataFiles'
Example #2: saveLocation:'/Users/PandaUser/Documents/Canvas_Data_Ex/dataFiles'
4. Generate API Credentials
View how to generate Canvas API credentials. Once you have this you must do one of the following:
A. Hard Coding Credentials (easier, but less secure)
-
Open your config.js file from step 2
- Remove process.env.CD_API_Secret and process.env.CD_API_Key
-
Replace with the secret and key you generated from your Canvas Data instance surrounded by double quotes.
End result should appear like this:
module.exports = {
saveLocation: '/Users/PandaUser/Desktop/canvas_data/dataFiles',
unpackLocation: '/Users/PandaUser/Desktop/canvas_data/unpackedFiles',
apiUrl: 'https://api.inshosteddata.com/api',
key: ''<your_canvas_data_key>'' ,
secret: ''<your_canvas_data_secret>'',
}
B. Store Credentials In Environmental Variables (more secure)
-
OSX
- In the same terminal window, or a new terminal tab, enter in nano ~/.bash_profile
-
Type in export CD_API_KEY='<your_canvas_data_key>'
-
Enter
-
Type in export CD_API_SECRET='<your_canvas_data_secret>'
- Some computers may require single or double quotes around the key and secret
-
Control + o (as in otter)
- Enter
- Control+x
-
Restart terminal
-
WINDOWS
- Guide to setting environment variables or you can view a video tutorial on accomplishing this here: [Windows] How to Configure Environment Variables for Canvas Data CLI Tool
- Path will be CD_API_KEY and CD_API_SECRET.
-
Key will be the associated Canvas Data Key and Secret values to those paths.
Use the CLI Tool
The CLI tool has three built in commands:
- Sync
- Fetch
- Unpack
Sync
If you want to simply download all the data from Canvas Data, the sync command can be used to keep your data from Canvas Data up to date if ran daily.
canvasDataCli sync -c path/to/config.js
Example: canvasDataCli sync -c ~/Desktop/config.js
Example: canvasDataCli sync -c /Users/PandaUser/Desktop/config.js
This will start the sync process. On the first sync, it will look through all the data exports and download only the latest version of any tables that are not marked as partial. It will also download any files from older exports to complete a partial table.
On subsequent executions it will:
- Check for newest data exports after the last recorded export
- Delete any old tables if the table is NOT a partial table
- Append new files for partial tables.
Fetch
Fetches most up-to-date data for a single table from the API. This ignores any previously downloaded files and will re-download all the files associated with that table.
canvasDataCli fetch -c path/to/config.js -t user_dim
Example: canvasDataCli fetch -c ~/Desktop/config.js
Example: canvasDataCli fetch -c /Users/PandaUser/Desktop/config.js
This will start the fetch process and download what is needed to get the most recent data for that table (in this case, the user_dim).
On subsequent executions, this will re-download all the data for that table, ignoring any previous day's data.
Unpack
NOTE: This only works after properly running a sync command
This command will unpack the gzipped files, concat any partitioned files, and add a header to the output file
canvasDataCli unpack -c path/to/config.js -f user_dim account_dim
Example: canvasDataCli unpack -c ~/Desktop/config.js -f user_dim
Example: canvasDataCli unpack -c /Users/PandaUser/Desktop/config.js -f submission_dim course_dim
This command will unpack the user_dim and account_dim tables to a directory.
Currently, you explicitly have to give the files you want to unpack as this has the potential for creating very large files.
List
This command will list all data dumps that are available to be downloaded. Using the grab command after finding the ID of the data dump that you want to use is the use case for this endpoint.
canvasDataCli list -c path/to/config.js
Example: canvasDataCli list -c ~/Desktop/config.js
Example: canvasDataCli list -c /Users/PandaUser/Desktop/config.js
Grab
This command will download a data dump based on the dump id provided. A directory consisting of the same name as the dump id will be created within the the path specified in the config.js file for dataFiles. The unpack command can then be utilized to uncompress the specified tables into the unpackedFiles directory.
canvasDataCli grab -c path/to/config.js -d id_number_of_data_dump
Example: canvasDataCli grab -c ~/Desktop/config.js -d 123492138498123
Example: canvasDataCli grab -c /Users/PandaUser/Desktop/config.js -d 0912342397412
Historical Requests
Periodically requests data is regrouped into collections that span more than just a single day. In this case, the date that the files were generated differs from the time that the included requests were made. To make it easier to identify which files contain the requests made during a particular time range, we have the historical-requests subcommand.
canvasDataCli historical-requests -c config.js
Its output takes the form:
{
"dumpId": "...",
"ranges": {
"20180315_20180330": [
{
"url": "...",
"filename": "..."
},
{
"url": "...",
"filename": "..."
}
],
"20180331_20180414": [
{
"url": "...",
"filename": "..."
}
]
}
}
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Is there a way to exclude the Request files when running the Sync command? I'm currently running the Canvas Data CLI tool in Powershell if that helps. If there is no way to do it while running the Sync command, would removing the requests folder from the SaveLocation directory, or maybe removing the "Requests" object from the schema file achieve the same result?
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
The fetch option allows you to specify a table to fetch, so you can run it multiple times to fetch every table besides the requests table. There is no option to sync to omit tables.
The schema is fetched as part of the sync process, so it wouldn't do any good to remove it there.
If the requests folder is missing, the tool will re-create it and attempt to download all of the information. Some people have replaced the files with 0 byte files with the same name to keep the storage requirements down, but it would still need to download it the first time.
The source code is JavaScript and one option is to go into the code and specifically keep requests from downloading. I haven't tested this, but the processFile function in Sync.js file seems like a good place to do a match on the filename and return if it matched requests rather than continuing.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@James I'm new to this CLI and API functionality but I'm hitting a stumbling block. Step 4 asks to create a data portal key and secret, but I don't see that option anywhere.
Additionally, the names to some of these features seem to have changed? In the documentation it's 'Canvas Data Portal' but ours is listed as just 'Data Services'.
Am I in the wrong spot? Who can I contact to get additional support? Thanks!
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
You will need to contact your CSM to enable Canvas Data Portal for you. Refer to this article for more information https://community.canvaslms.com/t5/Admin-Guide/What-is-Canvas-Data-Services/ta-p/142
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
re: Canvas Data CLI for Canvas Data 1. Is there a way to download data from Beta site instead of production? We're onboarding with Canvas and want to create test data in Canvas Beta site to see how it gets saved in CD1. (We're tracking CD2 on the way, but we do not yet have access yet.)
-Doug