Community Help

erik_gustafson · ‎12-02-2019

Our institution is wanting to take a look at Canvas Data to see what kind of information we can pull out of it. I have looked around the Canvas community quite a bit to learn more about the data portal and what it offers. I have seen how to download and sync data with the Canvas CLI tool, and it seems straightforward. I do have a few questions though to help us get moving forward.

With the CLI tool, does it append the new data from the previous day to the tables, or does it override the old tables with new tables?
1. How do I get all historical data?
Has anyone tried to automate an import into a database running MS SQL?
1. If this isn't a good option, then what would be a good alternative? We are wanting to us MS SQL because that is what we are used to using.

Thank you for your help?

robotcars · ‎12-02-2019

Hi @erik_gustafson ,

There is no delta data or differential sets. You will redownload the same data, plus what is new since the last time you downloaded. Using the Canvas Data CLI Tool, you can use the sync command to keep everything up to date, while the fetch command can be used to download everything for a specific table. The requests table is partial/incremental and has its own historical dump every ~2 months.

SSIS is an option, but I would have had issues maintaining this locally, and didn't like the time constraints if our data department managed it via SSIS. I believe Talend and some other licensed products are available to.

I opted for a Linux VM and some open source. I currently use and support Managing Canvas Data with Embulk, to load CD into MS SQL Server.

erik_gustafson · ‎12-02-2019

Hi carroll-ccsd,

Thank you for the quick reply. So, if I am understanding you correctly, the sync command will keep the tables updated with the latest data, correct?

It sounds like you use Embulk to load data in MS SQL Server instead of SSIS. Is this an automated task?

robotcars · ‎12-02-2019

Correct-ish. It will download all the gzipped files, and on rerun, will download any new gzipped files it doesn't have. I actually don't like this command for our daily needs. I use the fetch command to get a completely new copy of the data everyday. The sync command has an issue, and I honestly don't understand how others tolerate it. If you use the sync command, any records previously downloaded will not get updated, only records that are new, so you essentially have the original state of many records, not it's current state. For my daily import I rely on the fetch command to redownload and import all 100 tables. I incrementally load the requests table daily using the fetch command as well, and delete the previous sequences before hand. Fetch for requests table downloads a short window into the past. Some details on these steps here, Canvas Data CLI - What am I missing?

Embulk is just the data loader, and I have documented some useful tips an tricks (in the repo) as well as the known schema for 4 databases. The repo will help you import each set of tables with a command. My actual workflow is a bash script that goes through a text file and downloads each desired table, then another bash script that loads each table into a staging database with a considerable amount of fault tolerance and logging built in. I then use the staged data to load to production with a slightly different purpose. Automation is finished with a scheduled cron job.

nhumphries · ‎02-15-2020

Hi,

At Belfast Metropolitan College we use the Canvas Data Viewer to download the data and import it into a SQ:L Server. We then use Power BI to visualise the data

rodm · ‎06-11-2020

Hello Nigel. How much work was the setup and how trouble free is the ongoing process?

nhumphries · ‎07-09-2020

Hi Rod,

Its not that difficult to set up and once set up the process is easy to maintain

Nigel

droth2 · ‎09-17-2021

Good to know about Canvas Data Services! Most of our ETL runs through Python API calls and we were able to get a nightly import setup in a day using this Python library: https://github.com/Harvard-University-iCommons/canvas-data-sdk

Nigel - any chance you have written views to join all the star schema that you are interested in sharing?

reynlds · ‎09-21-2021

@nhumphries I'd like to get a look "behind the BI curtain" also (if possible). Thanks.

nhumphries · ‎09-21-2021

@reynlds no problem. We have moved on since I put up this post

reynlds · ‎09-21-2021

@nhumphries Looks good! However, I'm looking for more info regarding the queries that make up the various values like "files", "assignments", etc. Also, how the main visualizations are generated.

helpdesk2 · ‎04-22-2022

@nhumphries I'm just now starting to work with Canvas Data Portal, integrating it into SQL and Power BI. I would love to chat with you about your process, things to look out for and best tips and tricks.

Currently I have only installed the Command line tool and I'm trying to get data on a particular quiz that is in all courses since we start to use templates for all courses. There has to be an easier way LOL. But I'm not sure which data file would contain the quiz that is in all courses if the ID is different in every course, any ideas or experience with this?

nhumphries · ‎09-21-2021

@reynlds do you use Teams? if so you can call me nhumphries@belfastmet.ac.uk

mvanmatre · ‎04-21-2022

The Big Question from our developer:
Reading documentation online about Canvas Data Services and the Canvas Data Portal, I am aware that there is a way to download and sync Canvas data for view in outside tools, such as SQL Server Management Studio with the Canvas CLI tool. However, I have not found where to do so. Can someone point me in the right direction?

ben_hudson · ‎04-22-2022

Good Morning,

To get access to the Canvas Data Portal I believe you need to contact your CSM then they will configure your initial admin(s) and user(s), or ours did at least. Then we implemented this Canvas Data Viewer it did take some tweaking to get things to fully work. I would need to crawl through our process to get any specifics. The main issue we encountered was "Some tables are not created and populated #20" I have some lengthy comments in there that explain how I resolved the issues. We currently use our data for tracking daily access and assignment submissions for our online academy (not Instructure recommended), I have also written queries to track student and teacher last login as well but have received no other requests from our admin team so I have actually supressed the loading of a lot of data to speed the process up.

Also Canvas Data 2 is on the way so probably keep that in mind with implementing current Canvas Data.

Thanks,

Ben

Using Canvas Data with MS SQL Management Studio

AWS Harvard Data 1 extract conversion to Data 2

Error running initdb with DAP 1.1

CD1 to CD2 schema mapping document.

Is there a way to translate bash script to Azure w...

CD2 and Course Audit information

AWS Harvard Data 1 extract conversion to Data 2

DAP parquet file transformation strategy

pysqlsync - no rows to upsert/insert - tsv2py not ...

CD2 - Weblogs

Incremental query with missing pseudonym entries

You're signed out

Using Canvas Data with MS SQL Management Studio

Community Help

View our top guides and resources: