Canvas Data API Authentication

James
Community Champion

Synopsis

Calls to the Canvas Data API need to be authenticated. The Canvas Data API documentation​ provides example code that isn't explained very well. This post will explain what's going on so that people can prepare the signature needed to use the API. The code supplied by Canvas will be simplified to take advantage of the limited nature of the existing API. Finally, PHP and PERL code snippets will be given to assist those more familiar with those languages than JavaScript.

Introduction

The Canvas API uses tokens that translate into a user and password combination. The Canvas Data API does not, instead it requires that you sign your messages with a hash generated from an API Secret code that Canvas provides. You then pass your API Key provided by Canvas and the signed hash in a header. They provide sample JavaScript / Node.js code to accomplish this.

What's missing from the example code provided is exactly what is required to use it. That is, they give the code, but never show how it is invoked.

Existing Documentation

Signed Message

The signed message is made up of the following parts (these must be in order)

  1. The method of the request in uppercase. Methods might be GET, POST, PUT, DELETE, or HEAD. However, the only one currently supported by the API is GET.
  2. The host the request is being made to. This is the first major fail of the documentation. The host is portal.inshosteddata.com. That's not mentioned anywhere, so people think it's there production Canvas instance like canvas.instructure.com.
  3. The contentType is useful for a POST, but not for a GET. Currently, there are no POST endpoints, only GET ones, so leave this blank for now.
  4. The contentMD5 is an MD5 sum of the content, but you're not sending any content so leave this blank as well.
  5. The path portion of the URL used to connect to the API. This is available in the API endpoints and looks like /api/account/self/dump or /api/schema.
  6. Any query parameters, listed in alphabetical order. Currently, there are only two endpoints that support query parameters: /api/account/self/dump and /api/account/self/file/byTable/:tableName. These both support after and limit, which would make the query and path look like /api/account/self/dump?limit=100. Most of the API endpoints don't need query parameters and so this can be left blank. These have to be in alphabetical order because the hash changes for every string, so limit=100&after=45 returns a different hash than after=45&limit=100. Canvas has to be able to duplicate your string exactly
  7. The timestamp in HTTP Date format (RFC 7231). It looks like Tue, 01 Dec 2015 05:22:24 GMT. The documentation says you may also use ISO-8601 format, which looks like 2015-12-01T05:22:24.324Z. I have tested it and it works as described and also in some variant formats. For the RFC 7231 form, it accepts non-standard day of weeks (Tues, Thur) and ISO-8601 format with or without the milliseconds at the end.
  8. The API Secret that was provided by Canvas.

These 8 pieces of information are joined together with a newline character into a single string.

This string is then used as the message in a Hash-based Message Authentication Code (HMAC) code. These codes are salted or keyed using your API secret code. There are different methods available, the one that Canvas uses is the Secure Hash Algorithm (SHA) with a 256 bit key. Put all that together and you end up with HMAC-SHA-256.

Note that there is also a SHA-256 implementation that does not use a salt or key. You need to get the HMAC version to make this work.

That hash is a 256 bit value, represented in binary as 32 8-bit bytes. However, you can't easily send binary code in a web request, so it needs converted into a form that can be sent in the API request.

There are two main methods to convert the binary hash into something usable.

The first is to represent it in hexadecimal. Each byte can be represented by a 2-character hexadecimal code that uses the characters 0-9 and the letters a-f.  Those 32 bytes would become 64 hexadecimal characters.

Some of you may have seen hashes written in hexadecimal form before as many places supply a code when downloading software so you can make sure that you're getting a good version. For those running Linux, there is a sha256sum command that you can run on a file and it will spit out a 64 character hexadecimal representation of the SHA-256 hash of the file. For example, here is the hash (not signed with a key) for one of files downloaded from Canvas Data (the $ is the command prompt).

$ sha256sum wiki_page_fact-part-01833.gz

0b14f38fd7fe4df3f94e8b82d2c00e984a202e996f921f8bf4eca5dff3792532  wiki_page_fact-part-01833.gz

However, Canvas does not use the hexadecimal form.

The other major way to represent the binary hash is to use what's called base64 encoding. This is the same method used to send attachments through email. Encoding the binary data generally adds 1/3 of the size, but it's smaller than hexadecimal encoding, which doubles the size. A base64 encoded string must have a length that is a multiple of 4 and the string is padded with equal signs at the end to meet the length requirement.

Linux systems contain a base64 command and here is an example of what the strings look like (the -w 96 is just to keep it from wrapping).

$ echo 0b14f38fd7fe4df3f94e8b82d2c00e984a202e996f921f8bf4eca5dff3792532 | base64 -w 96

MGIxNGYzOGZkN2ZlNGRmM2Y5NGU4YjgyZDJjMDBlOTg0YTIwMmU5OTZmOTIxZjhiZjRlY2E1ZGZmMzc5MjUzMgo=

The above is a base64 encoded version of the hexadecimal version of the sha256 sum for the file I downloaded from Canvas Data.

But don't use that either! Remember that Canvas does not use the hexadecimal form at all. It doesn't base64 encode the 64 character hexadecimal version, it encodes the binary version of the 32 byte HMAC digest.

When you sign your message, you need to generate a binary version of the HMAC digest and then base64 encode that.

Authentication Headers

Once you generate your signed message, you add it to the HTTP headers.

According to the API documentation, you add them "like so":

Authorization: HMACAuth API_KEY:signature

Date: Thur, 25 Jun 2015 08:12:31 GMT

What they don't tell you is that the HMACAuth is the literal string HMACAuth but the API_KEY is replaced by your API Key obtained from the Canvas Data Portal. The signature is replaced by the base64 encoded version of the HMAC-SHA-256 binary digest.

The Date is the RFC 7231 date that was used in the generation of the signature. You must save and reuse the timestamp that you used to generate the signature. Do not make two separate calls to get the date and time because they may be different and then the signature would not match and the request would fail and you would get frustrated.

Notice that the date in the example is given as "Thur", although section 7.1.1.1 Date/Time Formats of the RFC7231 says "the preferred format is a fixed-length and single-zone subset of the date and time specification used by the Internet Message Format RFC5322." You are safe using the three-character day of week. I can verify that Tues works because I tested this on a Tuesday, but because the time has to be within 15 minutes of the server time, I can't test "Thurs" to see if it works. Just stick with the preferred three-character day and you're good.

Example for Testing

One important thing to have is the ability to test your code and make sure you're getting the right signature.

The table below contains some values you can use for testing and to demonstrate what the actual headers should look like. Don't worry, they're not my real API Key and API Secret, they didn't even come from Canvas, they're just SHA1 hashes of a couple of files on my server. You should never share your API secret with anyone.

ParameterValue
API Key27f65b589c0c21f4bd29fd2f0e1cdf552a578f98
API Secret335df060619bcc3f8562d58a57c22c44b90ee122
URLhttps://portal.inshosteddata.com/api/account/self/dump?limit=100&after=45

Timestamp

Tue, 01 Dec 2015 09:24:50 GMT

The valid HMAC signature for the example is: sOIJs/UZ7AySaRFfhRSFqDKlN93Ei+VvpZsVcKDfiJw=

If you use the values above, then your HTTP headers sent with the request should look like this

Authorization: HMACAuth 27f65b589c0c21f4bd29fd2f0e1cdf552a578f98:sOIJs/UZ7AySaRFfhRSFqDKlN93Ei+VvpZsVcKDfiJw=

Date: Tue, 01 Dec 2015 09:24:50 GMT

Additional Debugging

If you get X2CLfY2iMUlR3TJOK2G2q4Ix6e4mOLpmzOQ1H7RGDpY= for your HMAC signature, then everything is working properly except that your query parameters are not in alphabetical order. I put them in the wrong order to make sure that your code was alphabetizing them.

Several people have asked for a string that they can use for testing. This way they can verify the HMAC portion separately from the code to generate the strings.

Here are the eight lines that need joined with a newline character (\n) to form the message.

GET

portal.inshosteddata.com

/api/account/self/dump

after=45&limit=100

Tue, 01 Dec 2015 09:24:50 GMT

335df060619bcc3f8562d58a57c22c44b90ee122

The resulting string with the newlines is

GET\nportal.inshosteddata.com\n\n\n/api/account/self/dump\nafter=45&limit=100\nTue, 01 Dec 2015 09:24:50 GMT\n335df060619bcc3f8562d58a57c22c44b90ee122

Provided Example Code

Let's take a look at the code that Canvas provided in their documentation.

var crypto = require('crypto')

var url = require('url')

var HMAC_ALG = 'sha256'

var apiAuth = module.exports = {

  buildMessage: function(secret, timestamp, reqOpts) {

  var urlInfo = url.parse(reqOpts.path, true)

  var sortedParams = Object.keys(urlInfo.query).sort(function(a, b) {

  return a.localeCompare(b)

  })

  var sortedParts = []

  for (var i = 0; i < sortedParams.length; i++) {

  var paramName = sortedParams[i]

  sortedParts.push(paramName + '=' + urlInfo.query[paramName])

  }

  var parts = [

  reqOpts.method.toUpperCase(),

  reqOpts.host || '',

  reqOpts.contentType || '',

  reqOpts.contentMD5 || '',

  urlInfo.pathname,

  sortedParts.join('&') || '',

  timestamp,

  secret

  ]

  return parts.join('\n')

  },

  buildHmacSig: function(secret, timestamp, reqOpts) {

  var message = apiAuth.buildMessage(secret, timestamp, reqOpts)

  var hmac = crypto.createHmac(HMAC_ALG, new Buffer(secret))

  hmac.update(message)

  return hmac.digest('base64')

  }

}

Required Parameters

When you look at line 28, you'll see a message that is built from a secret, a timestamp, and reqOpts.

  • The secret is the easy part, it's your API Secret provided by Canvas.
  • The timestamp is an RFC 7231 date (HTTP date) although it indicates that ISO-8601 dates should work as well. The timestamp must be within 15 minutes of the server time to be valid. This must be a GMT or UTC time and not your local time.
  • The reqOpts is the mystery.

Looking at lines 16-19, it looks like the required parts are the method, the host, the contentType, and the contentMD5. Except that the required options are really not optional. If you don't specify them, they default to the empty string, except for the method, which defaults to GET.

The problem with that is that the signature won't be valid if you don't specify a host.

There's another required parameter in there. Looking at line 6, you see that path is required in the url.parse() method to parse and give urlInfo. The path is really the URL to the API call, so it should look like https://portal.inshosteddata.com/api/account/self/dump?limit=100&after=45

Note that I provided optional parameters limit and after in the example and that I put them in the wrong order. Lines 7-14 are there to alphabetize the keys from the parameters. Depending on the language and library you use, you may be able to guarantee that the query parameters will be in alphabetical order, but Canvas can't take that chance and so it's best to include a sort.

Anyway, in order for the provided example to work, you need to pass method, host, and the full URL of the API call as path.

What do I need to do?

If you want to use Canvas' code with JavaScript or Node.js, then here's how to invoke their code.

var today = new Date();

var timestamp = today.toUTCString();

var opts = {

  method: 'GET',

  host: 'portal.inshosteddata.com',

  path:'https://portal.inshosteddata.com/api/account/self/dump?limit=100&after=45'

};

var hmac = apiAuth.buildHmacSig('API Secret', timestamp, opts);

Luckily, the Date.toUTCString() method generates an HTTP Date (RFC 7231) formatted timestamp for you.

Line 6 is up to you, based on what API call you want to make.

Replace the 'API Secret' in line 8 by a variable containing your actual API secret

Sacrificing Flexibility for Simplicity

Why is this so Complicated?

"So" is relative. Some people may understand this without the blog, but the documentation is lacking. The actual process isn't that complicated once you have all the pieces, but Canvas' example without explanations made it harder than it needs to be. It's further complicated because they are using a generic routine that can be used in lots of cases, but it can be simplified to work with the current API offering.

There are really only three things that you need to know to create the HMAC-SHA-256 signature.

  • The timestamp. Most languages have a date() or similar function that will generate this for you. You can also use the strftime() function in many of them. Remember to use UTC or GMT time and not the local time.
  • The full URL of the API call.
  • The API Secret provided through the Canvas Data Portal.

Here's why you don't need everything shown in the example.

  • The only method supported right now is GET, so you don't have to specify it. Granted, that might change at some point and maybe Canvas will allow users to delete old versions before they expire automatically, but you're good for now.
  • The host is obtained through the url.parse() command. There's no need to specify it in the reqOpts and line 17 could be written using urlInfo.host instead of reqOpts.host.
  • The url.parse() command also returns the path and the query parameters as urlInfo.path and urlInfo.query. This is already in the code.
  • The contentType and contentMD5 aren't used, so they're always empty in the current incarnation.

Rewritten Code

This code works in the current API state, but it may change in the future. I have no control over that.

var crypto = require('crypto')

var url = require('url')

var HMAC_ALG = 'sha256'

var apiAuth = module.exports = {

  buildMessage: function(secret, timestamp, uri) {

    var urlInfo = url.parse(uri, false);

    var query = urlInfo.query ? urlInfo.query.split('&').sort().join('&') : '';

    var parts = [

      'GET',

      urlInfo.host,

      '',

      '',

      urlInfo.pathname,

      query,

      timestamp,

      secret

    ]

    return parts.join('\n')

  },

  buildHmacSig: function(secret, timestamp, reqOpts) {

    var message = apiAuth.buildMessage(secret, timestamp, reqOpts)

    var hmac = crypto.createHmac(HMAC_ALG, new Buffer(secret))

    hmac.update(message)

    return hmac.digest('base64')

  }

}

I've shaved 7 lines off the code and if you want to rewrite the parts array into a single line, you can have it in 17 lines.

var crypto = require('crypto')

var url = require('url')

var HMAC_ALG = 'sha256'

var apiAuth = module.exports = {

  buildMessage: function(secret, timestamp, uri) {

    var urlInfo = url.parse(uri, false);

    var query = urlInfo.query ? urlInfo.query.split('&').sort().join('&') : '';

    var parts = [ 'GET', urlInfo.host, '', '', urlInfo.pathname, query, timestamp, secret ]

    return parts.join('\n')

  },

  buildHmacSig: function(secret, timestamp, reqOpts) {

    var message = apiAuth.buildMessage(secret, timestamp, reqOpts)

    var hmac = crypto.createHmac(HMAC_ALG, new Buffer(secret))

    hmac.update(message)

    return hmac.digest('base64')

  }

}

What I did was change the uri.parse() command to not split up the query parameters. I then split it by the &, sorted it, and joined it back together in a single statement instead of lines 7-14 that Canvas used. You do need to check to make sure that there are actually some parameters, which is what the check as part of the ternary operator is for.

Better yet, when you call it, you have a simpler call.

var today = new Date();

var timestamp = today.toUTCString();

var uri = 'https://portal.inshosteddata.com/api/account/self/dump?limit=100&after=45';

var hmac = apiAuth.buildHmacSig('API Secret', timestamp, uri);

Other Languages

Part of the problem with the comprehension is lack of documentation. Another issue may be that people don't understand JavaScript. I know PHP and PERL much better than JavaScript, and so I tackled this problem in those languages.

PHP

In PHP, you can use gmdate() function to obtain the timestamp

$timestamp = gmdate( 'D, d M Y H:i:s T' );

For the base64 encoded version of the HMAC-SHA-256 hash, you can use base64_encode() and hmac_hash(). The $message in the following code is the parts of the message joined together witha  newline "\n" character.

$hmac = base64_encode( hash_hmac( 'sha256', $message, secret, TRUE ) );

PHP Code Snippet

<?php

// This is a code snippet

// Minimal information needed to generate signature

$timestamp = gmdate( 'D, d M Y H:i:s T' );

$url = 'https://portal.inshosteddata.com/api/account/self/dump?limit=3';

$api_secret = 'YOUR SUPER SECRET API SECRET GOES HERE';

// Generate signature

$hmac = hmac_signature( $timestamp, $url, $api_secret );

printf( "HMAC signature: %s\n", $hmac );

// Headers to add to request

$api_key = 'YOUR API KEY GOES HERE';

$http_headers = array ( 'Authorization: HMACAuth ' . $api_key . ':' . $hmac,

    'Date: ' . $timestamp );

// This version is slimmed down to only include what is needed for the current API

function hmac_signature($timestamp = NULL, $url = NULL, $secret = NULL) {

  if (empty( $timestamp ) || empty( $url ) || empty( $secret )) {

    return;

  }

  $u = parse_url( $url );

  if ($u === FALSE) {

    return;

  }

  $host = ! empty( $u['host'] ) ? $u['host'] : 'portal.inshosteddata.com';

  $path = $u['path'];

  $query = '';

  if (! empty( $u['query'] )) {

    $parms = explode( '&', $u['query'] );

    sort( $parms );

    $query = join( '&', $parms );

  }

  $parts = array ( 'GET', $host, '', '', $path, $query, $timestamp, $secret );

  $message = join( "\n", $parts );

  return base64_encode( hash_hmac( 'sha256', $message, $secret, TRUE ) );

}

Note there is a bug in the routine that displays the code here in the community. It shows empty() as emptyempty(). For a working version, fetch the code from Canvancement site on GitHub.

PERL

In PERL, you can use the HTTP::Date module to get the timestamp.

use HTTP::Date;

my $timestamp = time2str();

To generate the base64 encoded HMAC-SHA-256 digest, you'll need the Digest::SHA module. As with the PHP, the $message is the parts of the message joined into a single string with newline separators.

use Digest::SHA qw{hmac_sha256_base64};

my $hmac = hmac_sha256_base64( $message, $secret );

There is one additional warning about PERL. The requirement about the length being a multiple of 4 isn't enforced, so you'll need to do your own padding with equal signs at the end.

PERL Code Snippet

#!/usr/bin/perl

# This is a code snippet

# Standard modules

use strict;

use warnings;

use diagnostics;

use Carp;

# Specific modules required for signature

use URI;

use HTTP::Date;

use Digest::SHA qw{hmac_sha256_base64};

# Minimal information needed to generate signature

my $timestamp = time2str();

my $url    = 'https://portal.inshosteddata.com/api/account/self/dump?limit=100';

my $api_secret = 'YOUR SUPER SECRET API SECRET GOES HERE';

# Generate signature

my $hmac = hmac_signature( $timestamp, $url, $api_secret );

printf("HMAC signature: %s\n", $hmac);

# Headers to add to request

my $api_key = 'YOUR API KEY GOES HERE';

my $headers = {

  'Authorization' => 'HMACAuth ' . $api_key . ':' . $hmac,

  'Date'          => $timestamp,

};

1;

# Subroutine to generate the proper signature

# This is a whittled down version from what Canvas suggested.

# Currently, the Content-type and Content-MD5 are not used

sub hmac_signature {

  my $timestamp = shift || return;

  my $url       = shift || return;

  my $secret    = shift || return;

  my $uri       = URI->new($url);

  my $host = defined( $uri->host ) ? $uri->host : 'portal.inshosteddata.com';

  my $query = $uri->query || '';

  if ( $query ne '' ) {

    $query = join( '&', sort( split( '&', $query ) ) );

  }

  my $parts =

    [ 'GET', $host, '', '', $uri->path, $query, $timestamp, $secret, ];

  my $message = join( "\n", @{$parts} );

  my $hmac = hmac_sha256_base64( $message, $secret );

  # The HMAC implementation doesn't enforce the proper length.

  # Pad the end with = signs until the length is a multiple of 4

  while ( length($hmac) % 4 ) {

    $hmac .= '=';

  }

  return $hmac;

}

What's Next?

This blog post only addresses the authentication issue. You still have to make the API calls. There are lots of ways that this can be done and almost every language has one or more methods or packages for making HTTP requests. I'll leave that part of the programming up to you.

I will say, however, that I've written an CanvasDataAPI module for PHP and PERL that can be included in your projects. They provide an object-oriented interface and return a data structure of the results.

I've already started extending the PHP version to write an SQL file to create a database with the schema obtained through the API. Another part will download all the files in a dump using wget. Another writes a bash script that can be used to take all the files from the dump and load them into a MySql database. As I write this, it's working on my end, but it's not ready for publication yet.

The modules and the PHP and PERL code snippets shown above are available from my Canvancement site on Github.