cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
erik_gustafson
Community Participant

Using Canvas Data with MS SQL Management Studio

Our institution is wanting to take a look at Canvas Data to see what kind of information we can pull out of it. I have looked around the Canvas community quite a bit to learn more about the data portal and what it offers. I have seen how to download and sync data with the Canvas CLI tool, and it seems straightforward. I do have a few questions though to help us get moving forward.

  1. With the CLI tool, does it append the new data from the previous day to the tables, or does it override the old tables with new tables?
    1. How do I get all historical data?
  2. Has anyone tried to automate an import into a database running MS SQL?
    1. If this isn't a good option, then what would be a good alternative? We are wanting to us MS SQL because that is what we are used to using. 

Thank you for your help?

11 Replies
robotcars
Community Champion

Hi  @erik_gustafson ,

There is no delta data or differential sets. You will redownload the same data, plus what is new since the last time you downloaded. Using the Canvas Data CLI Tool, you can use the sync command to keep everything up to date, while the fetch command can be used to download everything for a specific table. The requests table is partial/incremental and has its own historical dump every ~2 months.

SSIS is an option, but I would have had issues maintaining this locally, and didn't like the time constraints if our data department managed it via SSIS. I believe Talend and some other licensed products are available to.

I opted for a Linux VM and some open source. I currently use and support Managing Canvas Data with Embulk, to load CD into MS SQL Server.

Hi carroll-ccsd,

Thank you for the quick reply. So, if I am understanding you correctly, the sync command will keep the tables updated with the latest data, correct?

It sounds like you use Embulk to load data in MS SQL Server instead of SSIS. Is this an automated task? 

Correct-ish. It will download all the gzipped files, and on rerun, will download any new gzipped files it doesn't have. I actually don't like this command for our daily needs. I use the fetch command to get a completely new copy of the data everyday. The sync command has an issue, and I honestly don't understand how others tolerate it. If you use the sync command, any records previously downloaded will not get updated, only records that are new, so you essentially have the original state of many records, not it's current state. For my daily import I rely on the fetch command to redownload and import all 100 tables. I incrementally load the requests table daily using the fetch command as well, and delete the previous sequences before hand. Fetch for requests table downloads a short window into the past. Some details on these steps here, Canvas Data CLI - What am I missing? 

Embulk is just the data loader, and I have documented some useful tips an tricks (in the repo) as well as the known schema for 4 databases. The repo will help you import each set of tables with a command. My actual workflow is a bash script that goes through a text file and downloads each desired table, then another bash script that loads each table into a staging database with a considerable amount of fault tolerance and logging built in. I then use the staged data to load to production with a slightly different purpose. Automation is finished with a scheduled cron job.

nhumphries
Community Participant

Hi,

   At Belfast Metropolitan College we use the Canvas Data Viewer to download the data and import it into a SQ:L Server. We then use Power BI to visualise the data

339641_Power BI Dashboard.PNG

rodm
Community Member

Hello Nigel.  How much work was the setup and how trouble free is the ongoing process?

nhumphries
Community Participant

Hi Rod,

          Its not that difficult to set up and once set up the process is easy to maintain

Nigel

Good to know about Canvas Data Services!  Most of our ETL runs through Python API calls and we were able to get a nightly import setup in a day using this Python library: https://github.com/Harvard-University-iCommons/canvas-data-sdk

Nigel - any chance you have written views to join all the star schema that you are interested in sharing?

reynlds
Community Participant

@nhumphries I'd like to get a look "behind the BI curtain" also (if possible). Thanks.

nhumphries
Community Participant

@reynlds no problem. We have moved on since I put up this postCanvasData Portal.PNG