Community help

reynlds · ‎05-25-2023

In CD1 we had a very verbose identifier for things like course IDs that made it easy to identify and join an assignment to a course. In CD2 we have to join together a shard and a context_id "on the fly" (still haven't figure out how best to do this) in order to give structure to my queries. Is there an easier way to do this? Or, can this just be added back to the tables?

To me it's another example of how Instructure can take something simple and introduce complexity for no reason (unless I'm not seeing a bigger picture...which, I'll admit, does occasionally happen).

ColinMurtaugh · ‎05-26-2023

@reynlds --

Joining courses and assignments in CD2 should be straightforward (at least it is in my instance). For example:

select courses.id, courses.name, assignments.id, assignments.title
from courses, assignments
where courses.id = assignments.context_id
and assignments.context_type = 'Course'
and assignments.created_at > now() - interval '1 day';

I actually find the data in CD2 to be easier to work with since I don't have to deal with shard IDs and converting between local and global IDs. The IDs I see in CD2 are local IDs and match the IDs that I see via the API and in Canvas itself (e.g. course IDs in URLs).

In any case, if you do need to translate between global IDs (the long ones which include the shard) and local IDs (shorter), here is the logic. You might need to make adjustments for your programming language.

To get a local ID from a global ID:

local_id = global_id % 10000000000000

To get the shard ID from a global ID:

shard_id = int( global_id / 10000000000000 )

To get a global ID from a shard ID and a local ID:

global_id = (shard_id * 10000000000000) + local_id

--Colin

LeventeHunyadi · ‎05-26-2023

Data Access Platform (DAP, and by extension, CD 2) has no concept of shard. Data is partitioned by tables and root account UUIDs. DAP API returns table data for a specific root account UUID. Typically, a single root account UUID is associated with the client key and secret that you use to access the service.

Shard is an implementation detail. It is related to how the data is organized into databases internally. It may be subject to change if that internal structure changes. It is not a stable identifier. CD 1 made the mistake of exposing this internal detail to the public.

DAP table primary keys are local to the root account UUID in which they are used. Let's suppose your university has two root accounts A and B. Both root account A and root account B can have a user with ID 42. If you need to process entities from different root accounts together, you must augment the record with the root account UUID, and form a composite key (a.k.a. a tuple):

(A, 10)
(A, 42)
(B, 12)
(B, 42)

KeithSmith_au · ‎05-28-2023

That is not really true if you have a consortia.

You cannot just augment with the root account UUID, because non-local references (such as user id) have the shard value. You are forced to utilise shards and process them (as Colin has indicated) otherwise you won't be able to resolve all the keys.

reynlds · ‎06-07-2023

@LeventeHunyadi I understand the separation of accounts, institutions, etc., as a way to basically create a "walled garden" for each entity to be able to use multiple, similar, identifiers. However, we've got one account (to rule them all), so are we the exception? Do most Instructure/Canvas subscribers have more than one requiring them to utilize the data in this manner?

LeventeHunyadi · ‎06-08-2023

Internally, Canvas maintains a separate database (an isolated instance) for each larger root account, this is what is called shard. Multiple smaller root accounts can share a single shard. Identifiers (such as those of accounts, courses, quizzes, submissions, etc.) are local to a shard, they uniquely identify an object within a shard but not across shards. For example, if you have a course in shard A and another in shard B, both can have the same local id of 2.

If your institution lives in a single shard (and most do), there is nothing more you need to do, you can safely cross-reference identifiers with one another. If your institution spans multiple shards and you want to store all data in one large table (usually not practical), you need to prefix identifiers with the root account UUID in order to make them globally unique. If your institution spans multiple shards but you separate data by root account (as we do in CD 2/Data Access Platform), identifiers are again safe for cross-referencing (within the root account).

CD 1 "globalized" identifiers by adding a large number to the local identifier to make a combined identifier that incorporated the shard number. CD 2 no longer does any transformation on identifiers, it's returning them in the way they are stored in the Canvas database.

reynlds · ‎06-08-2023

Thanks, @LeventeHunyadi . I think my initial issue was that I did not realize that the "context_id" in the assignments table was referencing the course_id in the courses table. You are correct, since we are a single instiutional space, it makes it easier.

KeithSmith_au · ‎06-08-2023

I repeat - for institutions that have a Consortia with trust relationships and multiple instances, what you are saying is NOT correct. With CD2 doing no translation, non local references, such as the user_id in an enrolment have the large shard reference. You cannot resolve that id unless you perform the translation, as that actual identifier does not exist in the other database You are forced to mod the identifier to obtain the local id reference, then take the high order digits to determine the root account, and thus generate the tuple.

Please explain how to use "shards" in CD2 to identify things

Canvas Data Access Platform (DAP) Python Client Li...

EDUCAUSE Insights: Data and Decision-Making

Problems with GraphQL in User Pageviews reports

Metrics Easy Button

Analytics API / Metrics calculation logic

Seeking advice on CD2, ETL and presentation proces...

Sample Data

Canvas Data Access Platform (DAP) Python Client Li...

Seeking Advice: Integrating CD2 Data for Student &...

Unrecognized Dialect when trying to connect to SQL...

You're signed out

Please explain how to use "shards" in CD2 to identify things

Community help

View our top guides and resources: