rubyn
New Member

Will there be a data model diagram provided for Canvas Data?

Didn't see anything similar to an e/r diagram - or anything that depicts the relationships between the tables in the documentation portal.  Did I miss it? 

21 Replies
James
Community Champion

Everything I know about dimensional modeling has been learned since the advent of Canvas Data, so I won't claim to be an expert, but what I've read is that Entity Relation type diagrams aren't the right way to approach Dimensional Modeling. An ER diagram may contain multiple DM diagrams.

You have a Fact (fact) table that joins to Dimension (dim) tables using foreign keys specified in the fact table. Those foreign keys are specified in the schema on the documentation portal. Those docs say that "Canvas Data also mostly adheres to a Star schema convention, which means that most of the relations should only be one join away." That means that you should, for the most part, not have to worry about the ER model where you join A to B and then B to C so that A you can use A and C in the same report. Instead, The fact table (A) contains a key to both B and C directly so you join A to C and A to C.

That doesn't mean that a figure wouldn't be nice, but it isn't as beneficial (necessary) as with a ER diagram. The main difference is that instead of having one diagram with lines all over the place, you would have a different diagram (page?) with a few lines for each fact table (currently 22 of them). Those dimension tables that would be linked are in the fact tables. I actually thought about writing a program that would generate those diagrams automatically from the tables but I have too many projects that I start and never finish already sitting in the queue.

Someone else may have already produced such a program and Canvas may have them internally, but the answer to your question is that it isn't there on the documentation portal unless it's buried on an unpublished link.

Yes, I was looking for Instructure to provide a data model diagram in their documentation portal that shows the relationships between the tables.  What I've seen from other vendors is a view from the Dim and Fact tables, and the tables they relate to; so not a single e/r diagram, sorry for the confusion. 

rob_callicotte
Community Contributor

Has anyone found the table schema for these tables? That would be handy in the creation of the tables on this side.

 @rob_callicotte ​,

When you ask about the table schema, are you talking about something different than those in the documentation at  Canvas Data Portal ?

Although I haven't had a chance to look at it so I can't speak to it's use in this case, there is a Canvas Data CLI tool that is available that will help with fetching the data. It's in Javascript and uses the Nodejs library. If you'd rather use PHP or PERL, I've written some code that will allow access to the API through a class or module. That would give you access to the schema from one of those languages. I have written (but not published) code that will write a .sql file that can be used to create a MySQL database to house the information.

rob_callicotte
Community Contributor

James, I wrote a C# program that took the table data from each table on that same page and converted it to SQL CREATE scripts, but there is no listing anywhere of field sizes for VARCHAR, etc., which hampers appropriate setup using SSIS for creation and maintenance of tables. Until we get an official word from Instructure on exact sizes and keys, etc., we're just guessing.

 @rob_callicotte ​,

Download the schema through the API rather than trying to get it off the webpage. The schema through the API comes in JSON format and it has the information you're looking for in there.

When the type is varchar, there is an extra field called length. For example, here is the entry for the course_dim.name field.

{

"description":"The friendly name of the course.",

"type":"varchar",

"length":"256",

"name":"name"

}

The Schema returned through the API also lists the dim tables you need to use with the fact tables in a field called dimension.

The authentication portion of the API seems to give the most trouble to people. I wrote a document that explains it giving PHP and PERL examples. If you don't want to mess with it, links are provided at the bottom that have complete API modules that you can just use in your PHP or PERL code and not worry about the implementation. Canvas Data API Authentication

All of that said, all of the varchar fields in schema 1.3.0 are 256 bytes long with the exception of requests.id (50) and external_tool_activation_dim.url (4096).

rob_callicotte
Community Contributor

Thanks James. Are you saying then that using this (http://canvas.park.edu/api/schema/latest​&<access_code>) we can get the schema? That simple? If so, great. I've been using the API, but noticed that some of the available fields in the Canvas Data are not in the API. But, you are saying that doesn't keep us from knowing the schema for the Canvas Data, because it is the same as what we find in the API?

No, I am not saying anything close to that.

Canvas Data API is different from the regular Canvas API and has a different host and authentication scheme. When I mention API in this thread, I'm talking about the Canvas Data API, not the Canvas API.

They are two different beasts with some overlap as to information available, but otherwise, completely different systems with different URLs and different hosts. Canvas API interacts directly with the Canvas system and is live and real-time. The Canvas Data API provides read-only access to archived database tables that about a day old.

rob_callicotte
Community Contributor

Yes, I see...and I'm just now reading through your fixing up the sloshed information from Canvas about the Canvas Data API. This changes everything. Thanks James! You have been incredibly helpful.

rob_callicotte
Community Contributor

James, I have two more questions:

1. Have you tried any of this schema access via CURL?

2. Do you work with .NET or C# at all?

As for CURL, if you've used CURL with this, I'm wondering how to do it that way, since I could then get the data via a C# program I've written to utilize CURL. Here was an attempt to use CURL today -

curl --insecure -v -k -H "X-Api-Key: <key goes here>" -H "X-Api-Secret: <secret goes here>" http://portal.inshosteddata.com/api/schema/latest

I use cURL routines within PHP, but not directly from the command line. The problem with using the API from any command-line based tool is that the authorization hash is based on the current timestamp and you only have an approximately 15 minute window within which to use it (you could stretch it to almost 30 minutes if you played around with the timestamp).. So, although you could write a program (or use one of the existing ones -- JavaScript, PHP, PERL) to generate an HMAC authorization code and print it along with the timestamp, you would then have to transfer both of those to your cURL routine within the next 15 minutes. I've found it much easier to just make the API call using a REST client from within whatever language you use to generate the code than to generate the codes in one program and then mess with transferring them to another.

If you want just the JSON code so that you can parse it, you could install PHP for Windows, copy my CanvasDataAPI.php file, remove the line that decodes the JSON (or you could take what I return and save it as an XML or similar file) and then read/parse it with C#.

That said, you're sending as wrong headers in your cURL statement. They're not "X-Api-Key" and "X-Api-Secret" that you need to send. You need "Authorization" and "Date". Also, try https without the --insecure. It _might_ work with http, I'm not sure, but it's safest to use SSL.

I do not use .NET or C# or D-flat or any of those other variants. However, you should be able to take the three examples (JavaScript, PHP, or PERL) I had in the authentication document and find similar libraries for .NET or C#.  That document was the result of a lot of trial and error on my part and you could benefit by sticking as close to it as you can -- any deviation is likely to lead to more trial and error (and probably frustration) on your part. Once you get the authentication part figured out, you can use whatever you use for the regular Canvas API to make the actual calls.

rob_callicotte
Community Contributor

Thanks for all your input and advise and information. This is has been helpful.

Are you familiar with a schema.json file? We have recently discovered this file in our \CanvasData\Datafiles folder. Looks like it's complete and will prove to be very useful for all of this we have been asking about. I'm wondering if this file is something newly added to the downloads, etc.

 @rob_callicotte ​,

Are you using the Canvas Data CLI tool?

The reason I ask is because the Sync.js file has a downloadSchema method that saves the file to schema.json. So, yes, that is exactly the file you would be getting through all this other discussion. That's not my program and I haven't looked at it other than when questions like this come up, so I didn't know it was doing that to give that answer earlier.

The source code for that file hasn't changed in the last 25 days, so I don't think it's a recent addition, though.

rob_callicotte
Community Contributor

For the first download we did use the Canvas Data CLI tool, James. But, we're now working on accessing the data, after we get the database tables setup. We've been venturing into the API coding, in the meantime.

Would you help with some of your documentation? I'm working on the signature and hash, etc., but need to know a few things.

Here's your documentation:

There are really only three things that you need to know to create the HMAC-SHA-256 signature.

  • The timestamp. Most languages have a date() or similar function that will generate this for you. You can also use the strftime() function in many of them. Remember to use UTC or GMT time and not the local time.
  • The full URL of the API call.
  • The API Secret provided through the Canvas Data Portal.

Based on this, does this explanation include the correct order for the SIGNATURE, prior to running it through the hash? So, it would be timestamp then the URI and finally the SECRET? Here is what I'm doing in my calls below and, even if it is in C#, I think you will know what I'm doing or not doing (in general) --

I use the SIGNATURE, which includes the SECRET to add to the KEY. You can see below where it is combined in the strAUTH [this is the KEY] : [the colon] then the strSIGNATURE [this is the SIGNATURE]. Is this appropriate?

          httpWebRequest.ContentType = "text/json";

            httpWebRequest.Method = "GET";

            httpWebRequest.Host = "portal.inshosteddata.com";

            httpWebRequest.PreAuthenticate = true;

            httpWebRequest.Date = timestamp;

            WebHeaderCollection httpWebHeaderCollection = httpWebRequest.Headers;

            httpWebHeaderCollection.Add("Authorization", "HMACAuth " + strAUTH + ":" + strSIGNATURE);

So...two questions and one request:

1. What are the order of the values that go into the SIGNATURE?

2. Is the Authorization using the KEY plus the SIGNATURE (the combined elements) or only the SECRET or only the KEY and then the SECRET?

     a. If it is combined, are the two elements divided by a colon?

3. Chris Ellis asked, "Could you ask James to paste a header minus his API_Secret?"

 @rob_callicotte ​,

If you haven't already done so, please be sure to pass the link to Canvas Data API Authentication  to Chris Ellis.

Answers to most of the questions you ask and the request you make are already provided there. It is much more complete than the few snippets you provide here and you won't be able to pull out statements like "all you need are three things" without reading the rest that explains what to do with those three things.

To answer your questions:

1. What are the order of the values that go into the SIGNATURE?

The order of the values is provided in the section Existing Documentation > Signed Message. It's kind of lengthy with 8 parts, although many of them are empty, so I'll just refer you back to that part rather than listing them all again here.

2. Is the Authorization using the KEY plus the SIGNATURE (the combined elements) or only the SECRET or only the KEY and then the SECRET?  a. If it is combined, are the two elements divided by a colon?

This is documented in the section Existing Documentation > Authentication Headers.

There is a single Authorization header. You are correct that there needs to be a colon between the key and the signature. Without knowing C#, what you have supplied sounds reasonable:

httpWebHeaderCollection.Add("Authorization", "HMACAuth " + strAUTH + ":" + strSIGNATURE);

There are only two required headers that you need to add, Authorization and Date. I don't know what is necessary for C#, but I didn't need to set Host or PreAuthenticate, but you do need to make sure that date is part of the same group as Authorization and I can't tell that it is from your source.

3. Chris Ellis asked, "Could you ask James to paste a header minus his API_Secret?"

I've already done you one better. In Existing Documentation > Authentication Headers > Example for Testing, there is an API_KEY, API_SECRET, URL, and TIMESTAMP provided that you can use for testing. Hard code those into your code until you get the correct signature (the part after the : that starts with sOIJs). In that section you'll also see the proper way to format the two headers, which is kind of also an answer to the second question.

Authorization: HMACAuth 27f65b589c0c21f4bd29fd2f0e1cdf552a578f98:sOIJs/UZ7AySaRFfhRSFqDKlN93Ei+VvpZsVcKDfiJw=

Date: Tue, 01 Dec 2015 09:24:50 GMT

Verify that you can use those values and get the correct signature

     sOIJs/UZ7AySaRFfhRSFqDKlN93Ei+VvpZsVcKDfiJw=

Don't even mess with trying to make the HTTP Request until you have this part working as it's going to fail (that request will also fail because it was just example data). So, for testing purposes, hard code the information, use the example values I gave, make sure that part is working. Then, once you have the signature working, you can remove the hard-coded values and use your own and move on to the HTTP request part.

The only two headers you need for authentication to work are Authorization and Date. You may have to add some others for you client, but the server doesn't need them. What the server does is take the information in the request, computes it's own HMAC signature, compares it to the one you provided to make sure they match, and if so, authorizes the transfer.

The part after the signature is the easy part, getting that right signature is why the document I wrote was so long.

All of that said, depending on what you mean by "data", the API won't help you access the data. It provides the Schema, but the Canvas-Data-CLI tool takes care of the downloading of the data (and maybe more). That's all the API provides you -- the ability to know what to download and the CLI automates it for you. The API is stuff you do before you download the data and create the database, but it doesn't help you analyze it. In other words, you may be spending a lot of time to obtain something you already have.

rob_callicotte
Community Contributor

While this is fantastic and I appreciate your time on this, I think the SIGNATURE is very fuzzy. Here is why - in C# I do not need to include GET and I'm therefore not sure about all of the remaining 7 things in your list, because C# makes a way to handle some of these already. At least that is why I am having trouble with your list of 8 things.

Are there 3 things in the SIGNATURE or 8 things in the SIGNATURE? What are these? I know this is a dreadful thing to do to ask someone who has written such lengthy documentation to explain something that might appear straightforward to you, but those eight things are so fuzzy in the Instructure documentation, it is difficult to see through the techno-geek. It appears that I need:

1. "GET"

2. "portal.inhosteddata.com"

3. The full URL --- in your example, this is https://portal.inshosteddata.com/api/account/self/dump?limit=100&after=45

4. Timestamp (Tue, 01 Dec 2015 09:24:50 GMT, in your example)

5. The SECRET

Putting all of these in order hashed together with the SECRET equals the SIGNATURE. Correct? Or still incorrect?

On another note, this is an example in C# of this request setup -

HttpWebRequest httpWebRequest = WebRequest.Create(strURI) as HttpWebRequest;

            httpWebRequest.ContentType = "text/json";

            httpWebRequest.Method = "GET";

            httpWebRequest.Host = "portal.inshosteddata.com";

            httpWebRequest.PreAuthenticate = true;

            httpWebRequest.Date = Convert.ToDateTime(strTimeStamp);

     

            WebHeaderCollection httpWebHeaderCollection = httpWebRequest.Headers;

            httpWebHeaderCollection.Add("Authorization", "HMACAuth " + strAUTH + ":" + strSIGNATURE);

Stop reading Canvas' documentation and things will get a lot less fuzzy. Seriously! That's why I wrote the documentation -- because it wasn't clear to me and I spent days trying to figure things out.

The signature requires 8 items, although many are blank or always the same value (like GET). Those 8 items can be derived from 3 pieces of information: the request URL, the timestamp, and your API_SECRET.

Those 8 pieces are joined together with a newline character (not CRLF that is commonly used by Windows) and then processed through a HMAC-digest with a SHA-256 key (the key is the API_SECRET).

If you look at the Sacrificing Flexibility for Simplicity > Rewritten Code portion of the document, line 8 of the second block (after shaving 7 lines off) boils down to this:

var parts = [ 'GET', urlInfo.host, '', '', urlInfo.pathname, query, timestamp, secret ];

You could clean that up a little. It's composed of 8 strings:

[ 'GET', host, '', '', path, query, timestamp, secret ];

You don't put the URL into the signature directly -- you have to break it up into parts first like you did in your example. The urlInfo.host, urlInfo.pathname, and query all come from the URL. The timestamp and the secret are the other two pieces of information beside the URL that are required. You put empty strings in for the missing pieces (which is what's missing from your 5 part code). You might not think the empty strings are necessary because they don't add anything, but all the parts are joined together with a newline, so you are missing 3 newlines if you only have 5 parts and it won't match.

With a concrete example, the URL 'https://portal.inshosteddata.com/api/account/self/dump?limit=100&after=45' and the timestamp and secret key from my example would get broken into these 8 pieces:

'GET'

'portal.inshosteddata.com'

''

''

'/api/account/self/dump'

'after=45&limit=100'

'Tue, 01 Dec 2015 09:24:50 GMT'

'335df060619bcc3f8562d58a57c22c44b90ee122'

As noted, you only need three things to compute the signature: the URL, the timestamp, and the API_SECRET. When you make the request to portal.inshosteddata.com, you need to make sure the server there has the same information that you have. The request includes the URL, so the server gets that. And then you add the Authentication and Date as headers so that the server can get those.

The server then takes those three pieces of information and here's what it does:

  1. Checks the timestamp to make sure it's within 15 minutes of the time it has. If not, reject the request.
  2. Use the API_KEY that was specified as the first part of the Authentication: HMACAuth header to look up the API_SECRET
  3. Take the URL, the timestamp, and the API_SECRET and compute a signature in the same way that is explained in Canvas' and my documentation
  4. If the signature that the server computes matches the signature that you specified after the colon in the Authentication: HMACAuth header, then it authorizes the request and all is good.

I think you're getting confused by the signature and the required fields in the header. You do not need to put values from the signature into the headers. The only values that need added to the headers are Authentication and Date. That's because the server can figure out what the rest of them are from the HTTP request, which includes the method (GET) and the URL (which can be exploded to get the host -- although the server already knows what host it is, the pathname, and the query parameters).

I briefly skimmed some C# documentation and it appears that Date is a protected header, so you have to set it separately. The Authentication still needs added. But it looks like you should not have to add any other headers before making your request.

But like I said, make sure you get the signature part working before you even attempt the HTTP request part. And as I added last night, you probably don't need to do any of this if you're willing to let the Canvas-Data-CLI do the API work for you.

rob_callicotte
Community Contributor

How helpful!! Thanks James. This is much clearer.

I'm getting this data now, which is the same as yours (by using newline -- somehow missed that),

GET

portal.inshosteddata.com

/api/account/self/dump

after=45&limit=100

Tue, 01 Dec 2015 09:24:50 GMT

335df060619bcc3f8562d58a57c22c44b90ee122

What I'm wondering, though, is if my HMACSHA256 is being done correctly, since I get this result 'nPrRpTmMs+DplUVGskOap+ACqGPg4XhGOy0ipTy7mwc=' instead of what you show. I am wondering if I should use an array to house this data, since I am using a string and breaking it up via the '\n'. I'll check your information to see if I need an array instead.

You need to join those pieces together with newlines. It's easiest if they're an array to start with and then use a join() command of some kind, but you can manually create the string.

The HMAC Digest is ran on a string, though, not an array.

Also, you need to BASE64 encode (which it looks like you're doing) your HMAC binary output, not the hexadecimal representation of it.

James,

Thank you for the work you have put into explaining how this all works. I would be lost without it.

I am new to Canvas and am trying to get Canvas Data API working. I have tried your Perl scripts and just now found the CLI tool you mentioned. I'm getting an error when I try to fetch data from Canvas. From your Perl, I get "Unauthorized" and from the CLI tool I get this error message when I run "canvasDataCli sync -c config.js":

fetching current list of files from API...
an error occured
{ [Error]
errorCode: 401,
resp: { error: true, status: 401, message: 'Invalid api key' } }

I have the key and secret set as the appropriate environment variables. I created the key and secret through the Canvas Data Portal. Do you know what I should try next?

0 Kudos

Actually, I think I figured out the CLI. I had put the key and secret in quotes when setting the environment variables. When I reset the variables without quotes, it now appears to authenticate properly. What should I look at to get the perl program working. I did run the perl hmac code snippet with your test inputs and got the correct HMAC signature. I tried querying the api from my perl script using this:

my $urls = `curl -H "$headers" $url`;

printf("%s\n", $urls);

and I get "Unauthorized" in response.

0 Kudos