Automated import of selected Canvas tables into Microsoft SQL Server using only canvas-data-cli (via node) and SQL Server Agent.

JosephGullo
Community Explorer

We wanted a way to import selected data from selected Canvas tables into a Microsoft SQL Server instance in a way that can be scheduled for automation and required the smallest number of dependencies and tools.  Using canvas-data-cli, we were able to accomplish this on a windows server with only node, Powershell, and the SQL Server Agent.  There are many other solutions out there for this, but we wanted to share ours because of its maintenance simplicity.  There is a bit of setup the first time you want to grab a new table, but overall it is extremely simple and can be managed with very little DB knowledge, making it maintainable even by new admins.

The setup for canvas-data-cli is well documented, but all we had to do was install node on the server.  When we installed it, we wanted to make sure the cli was in a global location, so we did this:

 

npm config set prefix C:\node-modules
npm install -g canvas-data-cli

 

We then keep a .ps1 script in C:\Scripts which we execute via a scheduled task, running under a local user with no privileges other than "Log on as a Batch Job" which is set in the local group policy.  The content is attached in the .ps1.txt file.  This defines some file paths at the top, including the one to config.js which needs to be established per the canvas-data-cli instructions.  Next, there is an array listing (at the time of creation) all possible Canvas tables available for download, with the ones we actually want to download un-commented.  The script will download the latest version of the schema file, then will fetch each table indicated, unpack them, replace "\N" characters representing NULL with nothing, then converts line endings for import.  At the point of execution of the script, all the downloaded tables are on the server and are ready for import into SQL.

Now, we need to create a .sql script for each table we want to import.  We have a single SQL Server Agent job, and each table we import becomes a step in the job with it's own script.  One challenge we found was in mapping datatypes to fields; we will be using the "BULK IMPORT" method in TSQL, and there is no type conversion for this method.  We found that timestamps need to be brought in as varchar values and sanitized, and also booleans need to come in as varchar values then converted to bit. 

For each table, we took the schema.json file, ran it through a "json pretty-printer" (there are many online which create a collapsible tree view) and used it to define the types for each column.  We looked at the export and figured out which fields need to permit NULL values, and which do not.  Then, we created a staging table and used bulk import with defined field and row terminators to import the data.  One thing we found, due to unix endings, we had to use a special character for row termination instead of just "\n", see the .sql.txt file below.  When bringing the data into staging, you should bring in every column, not just the ones you want. 

Once the data is in the staging table, we then create the final production table and select which data we want to import, as well as perform any datatype manipulation that is needed.  Below are examples of how we handled booleans and datetimes:

 

	, CASE
		WHEN conclude_at IS NULL THEN NULL
		WHEN LEFT(conclude_at, 4) < 1900 THEN NULL
		ELSE CAST(conclude_at AS datetime) 
		END as conclude_at
	, CAST(
		CASE
			WHEN publicly_visible = 'true' THEN 1
			WHEN publicly_visible = 'false' THEN 0
			ELSE NULL
			END as bit
		) as publicly_visible

 

We had some misc courses with 1600's years which caused out-of-range errors, so we're now catching them whenever we import dates.  There may be other tweaks as we expand this to more tables, but the methodology is pretty simple.  The setup process for going through the columns is a bit of a pain, but once you do it a single time it's done going forward.  We recommend to represent every column in the code, but then comment out the final columns you don't want so it is trivial to add them back later.

We schedule the SQL job about 30 minutes after the import job daily, and every day we have fresh tables with selected Canvas data available for whatever must be done to them.  Feedback is welcome, and we hope this is useful for folks out there.

Labels (3)