cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
s_moles
Community Participant

Increased length of surrogate keys

Jump to solution

Hello. It appears that our surrogate keys recently increased in length by the addition of '111' at the start of all of them. For example, a course_dim id that previously had been 60000000123456 is now 11160000000123456. This led to issues with our loading the data into a local Oracle database for reporting.

Should we anticipate future changes of this sort? What is the best way to be informed about them before they happen.

Thank you.

0 Kudos
1 Solution

Accepted Solutions
ccoan
Instructure
Instructure

Hello,

Happy to chime in here. So what actually happened is unique to your University. The change that occurred here was a result of something called a shard split. A Shard Split is a maintenance event some of our older customers have to go through due to a change in our architecture we had a few years ago.

Generally we warn users of the affect this has on Canvas Data, but it turns there was a slip in this process, something which we've since corrected, and enforced following the template. So there's little to no means of miss communication.

Essentially what happens in a shard split is their internal "Shard ID" changes. Since the "id"'s are based off of the Shard ID, these IDs of objects also changed (increasing in length). This is a stable change, and from now on you should always have IDs the same length.

One part I'm confused about though is why you're using IEEE-754 for IDs? IEEE-754 deals with floating point numbers, even though an ID will never be a floating point number. Was there documentation somewhere we can update that led you to this conclusion?, Maybe somewhere we can clarify that a bit better. Anyway you should be able to use a BIGINT (or other 64 bit integer), or a String to represent these IDs.

If you have any further questions please don't hesitate to ask.

Thanks,

Eric

View solution in original post

4 Replies
amcdona
Community Participant

I double checked several days worth of extracts (11/10 -11/14) for my account and don't see any changes.

s_moles
Community Participant

Thanks Andrew for checking.

We are still facing an issue with this however, and I'm really hoping someone from Instructure will respond.

The addition of the three digits increased the length of our primary keys from 14 digits to 17. This is causing an issue for us due to a 15-digit constraint of our BI tool (Cognos) that is based on the IEEE-754 standard. We may be able to work around the limitation by converting them to strings, removing the first four digits and converting them back to numbers, but before we do that, we want to know that the keys are now stable--that additional changes to them won't happen.

I'm surprised this issue has not cropped up for anyone else, which makes me wonder if we are the only ones to see changes to the primary keys, or if there is something unusual, and maybe not so smart, about the way we're trying to use the data.

Thanks in advance,

Brook Moles

ccoan
Instructure
Instructure

Hello,

Happy to chime in here. So what actually happened is unique to your University. The change that occurred here was a result of something called a shard split. A Shard Split is a maintenance event some of our older customers have to go through due to a change in our architecture we had a few years ago.

Generally we warn users of the affect this has on Canvas Data, but it turns there was a slip in this process, something which we've since corrected, and enforced following the template. So there's little to no means of miss communication.

Essentially what happens in a shard split is their internal "Shard ID" changes. Since the "id"'s are based off of the Shard ID, these IDs of objects also changed (increasing in length). This is a stable change, and from now on you should always have IDs the same length.

One part I'm confused about though is why you're using IEEE-754 for IDs? IEEE-754 deals with floating point numbers, even though an ID will never be a floating point number. Was there documentation somewhere we can update that led you to this conclusion?, Maybe somewhere we can clarify that a bit better. Anyway you should be able to use a BIGINT (or other 64 bit integer), or a String to represent these IDs.

If you have any further questions please don't hesitate to ask.

Thanks,

Eric

View solution in original post

s_moles
Community Participant

Thanks for the explanation, Eric.

I'm sorry if I muddied the waters with the reference to IEEE-754. The basic issue is that our BI tool, Cognos, is 32-bit and can accommodate numbers only to a precision of 15 digits. There is an explanation for that in an IBM support doc, which, as I read it, attributes the limitation to IEEE-754.

Using the Cognos modeling tool, Framework Manager, we create a metadata layer that includes how tables should be joined. Of course, the joins between the Canvas Data tables make extensive use of the ids. When possible, Cognos pushes the joins to the database, which in our case is Oracle and 64-bit. When that happens (i.e., when a join is executed in a query run on the database), everything works fine.

But in our model, we have combined the Canvas data with data from other systems, including our student system, to provide it with more context. That means the data is coming from multiple databases, which in turn means that Cognos cannot push all the queries to "the" database. The result is that some joins are executed locally on the Cognos server--including joins between Canvas tables. When that happens, the ids do not do their job. They are essentially rounded up to the 100s place, which means, if we're joining files to courses on the course id, for example, there can be a cross product of multiple courses. All the course ids that get rounded to the same number get joined to each other. Not good.

Had we created a data mart, pulling the data from the various systems into a single database, we would have avoided the problem, but in the name of expediency, we opted to bring it together in the Cognos model instead. Didn't turn out to be very expedient, but we're too far in at this point to go back and do it the other way.

So, we'll follow your suggestion and treat the ids as strings.

Thanks again.