After an outage on September 1, the Instructure Community is now fully available, including guides, release notes, forums, and groups. If some styling still looks unusual, clear your cache and cookies.
Found this content helpful? Log in or sign up to leave a like!
We wanted a way to import selected data from selected Canvas tables into a Microsoft SQL Server instance in a way that can be scheduled for automation and required the smallest number of dependencies and tools. Using canvas-data-cli, we were able to accomplish this on a windows server with only node, Powershell, and the SQL Server Agent. There are many other solutions out there for this, but we wanted to share ours because of its maintenance simplicity. There is a bit of setup the first time you want to grab a new table, but overall it is extremely simple and can be managed with very little DB knowledge, making it maintainable even by new admins.
The setup for canvas-data-cli is well documented, but all we had to do was install node on the server. When we installed it, we wanted to make sure the cli was in a global location, so we did this:
npm config set prefix C:\node-modules
npm install -g canvas-data-cli
We then keep a .ps1 script in C:\Scripts which we execute via a scheduled task, running under a local user with no privileges other than "Log on as a Batch Job" which is set in the local group policy. The content is attached in the .ps1.txt file. This defines some file paths at the top, including the one to config.js which needs to be established per the canvas-data-cli instructions. Next, there is an array listing (at the time of creation) all possible Canvas tables available for download, with the ones we actually want to download un-commented. The script will download the latest version of the schema file, then will fetch each table indicated, unpack them, replace "\N" characters representing NULL with nothing, then converts line endings for import. At the point of execution of the script, all the downloaded tables are on the server and are ready for import into SQL.
Now, we need to create a .sql script for each table we want to import. We have a single SQL Server Agent job, and each table we import becomes a step in the job with it's own script. One challenge we found was in mapping datatypes to fields; we will be using the "BULK IMPORT" method in TSQL, and there is no type conversion for this method. We found that timestamps need to be brought in as varchar values and sanitized, and also booleans need to come in as varchar values then converted to bit.
For each table, we took the schema.json file, ran it through a "json pretty-printer" (there are many online which create a collapsible tree view) and used it to define the types for each column. We looked at the export and figured out which fields need to permit NULL values, and which do not. Then, we created a staging table and used bulk import with defined field and row terminators to import the data. One thing we found, due to unix endings, we had to use a special character for row termination instead of just "\n", see the .sql.txt file below. When bringing the data into staging, you should bring in every column, not just the ones you want.
Once the data is in the staging table, we then create the final production table and select which data we want to import, as well as perform any datatype manipulation that is needed. Below are examples of how we handled booleans and datetimes:
, CASE
WHEN conclude_at IS NULL THEN NULL
WHEN LEFT(conclude_at, 4) < 1900 THEN NULL
ELSE CAST(conclude_at AS datetime)
END as conclude_at
, CAST(
CASE
WHEN publicly_visible = 'true' THEN 1
WHEN publicly_visible = 'false' THEN 0
ELSE NULL
END as bit
) as publicly_visible
We had some misc courses with 1600's years which caused out-of-range errors, so we're now catching them whenever we import dates. There may be other tweaks as we expand this to more tables, but the methodology is pretty simple. The setup process for going through the columns is a bit of a pain, but once you do it a single time it's done going forward. We recommend to represent every column in the code, but then comment out the final columns you don't want so it is trivial to add them back later.
We schedule the SQL job about 30 minutes after the import job daily, and every day we have fresh tables with selected Canvas data available for whatever must be done to them. Feedback is welcome, and we hope this is useful for folks out there.
Joseph,
Thank you so much for your explanation, it encourages me to believe that we can do similar work on our campus. I have some questions: (1) do you limit the data at the outset (say the last five years), so that the extracted tables don't go back to the start of time? (2) how much space in SQL Server is the extract taking up? (3) will this work with Canvas Data 2.0?
I need the data on SQL Server because I'm interested in joining tables from our student data warehouse, which is on SQL Server. However, I'm looking into asking someone else to do the database admin for the extraction.
I actually did a v2 of this guide which is canvas data v2 complaint, which is here:
https://community.canvaslms.com/t5/Data-and-Analytics-Group/Guide-Using-Canvas-Data-v2-to-download-e...
We don't limit the data at the outset, and for the tables we're pulling in about 250MB each pull.
Joseph,
Thank you for your reply and your significant contribution to the community. I for one am benefitting greatly from this.
Fernando
To interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in