Your Community is getting an upgrade!
Read about our partnership with Higher Logic and how we will build the next generation of the Instructure Community.
Found this content helpful? Log in or sign up to leave a like!
While I was originally happy to see that the long ids in Canvas Data 1 (e.g. of the type 64750000000043215 ) had been removed in Canvas Data 2, it eventually came time for me to re-write all the queries we have in place for various uses. These queries mostly join Canvas Live Events with 1 or more Canvas Data tables. With Canvas Data 1 this was relatively easy to do, as the long ids were present in both CD1 and also Canvas Live Events. However, the migration to CD2 therefore presents the additional challenge that the long ids remain in Canvas Live Events.
While it's not impossible to convert one to match the other (just for the purposes of running queries), it is definitely not convenient. In addition, it would make sense for Instructure to use the same convention in both data sources, of course. Are there any plans to do this?
Solved! Go to Solution.
Unfortunately, I am not aware of any short-term plans to change how Live Events produces entity identifiers.
When designing CD 2, we wanted to represent identifiers in a way that is closest to the source. Internally, local IDs are used for references within a root account (e.g. courses) and global IDs are used for references to outside one's root account (e.g. some of the users). Specifically, users who live in one root account but have a replica in another root account have a global ID in CD 2, whereas normal users (users in their home root account) have a local ID. As a result, it is possible to differentiate local and replicated users in CD 2, which was not possible in CD 1.
There is a one-to-one mapping between global IDs and local IDs: global IDs are constructed from the so-called shard ID and the local ID. Among other places, the shard ID is shown in the Canvas Site Admin interface. The shard ID is written in the high decimal digits, and the local ID is in the low decimal digits of the global ID. The recommended practice to join tables that use different types of identifiers is to either drop the shard ID component of a global ID (for single-account institutions), or augment the local ID with the shard ID (for multi-account institutions). We use these same transformations internally for combining data. In particular, some of our data stores use a pair of shard ID and local ID for each record.
For most users this is a moot point, but you do need to understand that their is a real meaning behind this. If you only have a single Canvas instance, then all the the id references are local (to entities in the same instance). If you are a large organisation and running a Consortia (multiple instance of Canvas tied together with a trust relationship to allow for large scale), then entity references can either be local (same instance) or to entities in a different instance. In this case, the long form is necessary to determine the actual entity, since the ids are not unique across the instances.
The ways in which this is handled are not always consistent (as you are discovering). Working with the large amounts of data that comes with having a Consortia (we have 12 Canvas instances in a Consortia in production), the most efficient way I have found to deal with this is augment the tables on ingestion with the relevant long (sharded) reference for all columns where a non-local reference is possible (many things cannot refence across instances, e.g. a course section cannot be in a different instance to the course, but an enrolled user can be), and then always use a globally unique long reference to do the joining.
Interesting point @KeithSmith_au about the multiple Canvas instances (we have only one, and although I was aware that the long ids include a reference to a shard, I had completely forgotten when they could be useful). It seems that your solution involves quite a bit of work and it might not be the best one for those institutions having only one instance though. In my case, I've resorted to augment/calculate for now an extra field for the live events stream instead, that gives me the local reference (then I can join using the local reference) but I can see why you may need to augment CD2 tables though. I am still interested in hearing from Instructure though, both in terms of their thoughts/plans to make their handling more consistent across data sources, and also to see if they recommend a particular best practice to deal with the inconsistency (I imagine they must have considered this point as they moved from the long ids to the local ids in CD2).
Unfortunately, I am not aware of any short-term plans to change how Live Events produces entity identifiers.
When designing CD 2, we wanted to represent identifiers in a way that is closest to the source. Internally, local IDs are used for references within a root account (e.g. courses) and global IDs are used for references to outside one's root account (e.g. some of the users). Specifically, users who live in one root account but have a replica in another root account have a global ID in CD 2, whereas normal users (users in their home root account) have a local ID. As a result, it is possible to differentiate local and replicated users in CD 2, which was not possible in CD 1.
There is a one-to-one mapping between global IDs and local IDs: global IDs are constructed from the so-called shard ID and the local ID. Among other places, the shard ID is shown in the Canvas Site Admin interface. The shard ID is written in the high decimal digits, and the local ID is in the low decimal digits of the global ID. The recommended practice to join tables that use different types of identifiers is to either drop the shard ID component of a global ID (for single-account institutions), or augment the local ID with the shard ID (for multi-account institutions). We use these same transformations internally for combining data. In particular, some of our data stores use a pair of shard ID and local ID for each record.
To interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign InTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign In