Community help

jack0x539 · ‎07-24-2019

I keep trying to use Live Events, but every time I'm stumped by the fact that the messages don't contain useful identifiers, for example:

I want to know when as assignment is created or updated so that I can get information regarding said assignment and place it into a student's calendar.

If I use Live Events for this, I subscribe to the Assignment Created and Assignment Updated events, then wait for the magic to happen.

I receive a message telling me that an assignment has been created, the message contains the information for the user who created it, the assignment itself and the course within which it was created.

The user does come through with a sis_source_id, huzzah; for the other two though, the course and assignment, all I get is the _id_ and _extensions.com.instructre.canvas.entity_id_, both of which I can only seem to match up against information I glean from Canvas Data. Canvas Data is at least 48 hours out of date - what use is a "live" event when I can't locate the entity within Canvas before waiting at least 2 days to look it up in a Canvas Data extract?

What I'd really like is an honest Canvas ID for the entities referenced in the Amazon SQS message, or at least maybe the sis_course_id or sis_integration_id, then I could limit my searching through the standard API.

Has anyone solved this?

robotcars · ‎07-24-2019

Hi jack0x539

I'm still learning these concepts, but here's my understanding.

Canvas Data has high latency, but is highly accurate. It's old when we get it, but is generally correct.

Live Events has low latency, but is less accurate. It's inaccuracy exists in what events are available and what you subscribe to. There's also the possibility that events aren't published to the queue if the code failed to complete/send the event.

Together, the two form what is called Lambda architecture - Wikipedia

We have a batch layer (Canvas Data) an a speed layer (Live Events)

Neither solution solves all problems.

Our schools and teachers have an LTI we wrote using Canvas Data, and it has notices on each page saying the data is old. We have an attendance screen in that LTI they liked, but had to wait until Monday to take attendance for Friday. Using Live Events I was able to combine it with Canvas Data, and show real time current user activity and submissions by appending rows from Live Events to Canvas Data, typically by creating queries on both sets, setting up distinct columns and then using a UNION. Specifically, submission_dim along with LE's submission_created and submission_updated to have any submission counts for the student including what they've submitted today.

The drawback, is that now my entire data workflow has to include more than just submissions and CD. I have highly accurate submissions rolled up in submission_dim with an attempts column that shows me X attempts were made, but the timestamp is based on the last submission. In Live Events we have every attempt, I don't want every attempt though (for these purposes), I want the last timestamp and the submission_id, so I can link it to the assignment id, and then show that this student submitted that assignment 10 minutes ago.

The precision of your data choices will depend on your use case, here are some considerations:

An attendance LTI that shows teachers real time student attendance
An SIS integration that includes hourly batch differentials
Courses, enrollments, and assignments created in Canvas in real time

To do this without a 24+ hour delay we will need to account for the following:

Courses, Assignments, Submissions, Users, and Enrollments from Canvas Data

The same from Live Events, for a given time frame of events we store. I'm leaning toward 7 days. The overlap helps bridge issues in the latency gap. I'm still waffling about how long to hold them but in general I'd say cover the latency gap in CD and account for fault tolerance if CD didn't import for a couple days, like if it were to fail to import over the weekend and I'm not in the office.

This means if student who was enrolled in the course today, or an assignment was created today and published, that I also have to include the LE data for the user, enrollment, or assignment that doesn't exist in CD.

This means I stop joining LE to it's CD equivalent and I write an SQL view that combines CD user_dim with LE user_created, CD enrollments with LE enrollment_created+enrollment_updated (reduced to order over by most recent workflow_status), and CD assignment_dim with LE assignment_created. Now, I have views that merge batch with streaming for accurate and low latency purposes.

The use case you describe as an assignment is created and published, the student gets visibility of the assignment, the student then gets an update to their calendar with the assignment due date? Isn't this a default in Canvas?

Since the assignment event is sent with the instructor info, or at a minimum has nothing to do with the student. We have to take the assignment, it's status, the course, it's status, the course's enrollments, and users. Combine LE with CD where necessary.

Here's the problem...

You are using the Caliper format, currently this makes your use case difficult, possibly impossible, because there is no user_created/updated events in the Caliper events. There also isn't a course_updated, but there is course_created. I believe Canvas is working on adding new events for Caliper, but they aren't available yet. I prefer the Canvas Alpha (or Raw) format, more events more CD/API compatible. Live Events: Event Type by Format

This fall there are changes coming that should allow us to subscribe to either, both, or a mix of Caliper and Alpha formats.

Hypothetically, there are other ways of doing this, possibly by just using LE. But the ingestion process would likely look more like an UPSERT, updating records that exist and inserting new rows; versus storing all events and working on the last status. There are even more ways of ingesting and combining CD and LE, but that's not the topic.

The Caliper IDs and the nesting are pretty awkward, but I wrote a couple functions that reduce the nested and verbose field names to underscore_notation and remove the extensions|com|instructure|canvas for LED.

_flatten and _squish ledbelly/ims_caliper.rb at master · ccsd/ledbelly · GitHub

While I don't use Caliper, LED can consume them, and I'm open to un-complicating some of it's specs and aligning them with the Alpha format, CD, and the API. I mention that here, LED Known Issues

jsavage2 · ‎07-24-2019

I haven't dug too deeply into actually using live events in this way, yet, personally. Just monitoring things at the moment out of curiosity, so don't quote me on this...

But it *seems* that Live events are forward-looking, not backward looking, and the intended integration is with the GraphQL API as it develops, not the existing REST API and the existing lagacyNode integer _ids.

I can't find it documented anywhere (other than the source code itself) but it certainly looks like Live Events are using the same Shard.global_id_for() function as the GraphQL Loader functions (or vice versa).

I take that to mean that the new GraphQL node ids will map to the Live Events entity ids, and are, in fact, the same gloablly-unique ids that Canvas Data has always used.

It would be nice to see that confirmed documented somewhere, but it appears to be the case.

If spot checked a few cases, and plugging random "id" strings from assignments_fact or assignments_dim into GraphQL queries seems to return the expected results, e.g.:

query MyQuery {
assignment(id: "<assignment_fact_id>") {
    id
    _id
    dueAt
    description
    createdAt
    course {
      _id
      courseCode
    }
    htmlUrl
    hasSubmittedSubmissions
}
}

robotcars · ‎07-24-2019

Shard and Global ID's make sense, but Caliper isn't aligned with Canvas, Canvas aligns with the Caliper spec.

Looking at Caliper Events : Message Structure and Properties we see that a user id is

urn:instructure:canvas:user:21070000000000049

OK, whatever.

What's less comforting is the concatenated IDs that exist in other fields.

membership_id: urn:instructure:canvas:course:100000000000012:Instructor:100000000000001

Or that an actor_id, which is generally the user who triggered the event, can be a URL. :smileycry:

This is entirely different than events published through the Alpha/Raw format.

Alpha Payload Example : asset_accessed

Caliper Payload : NavigationEvent

This leaves the end user (us programmers) to resolve this to the shard id to use it against Canvas systems. Maybe this is only an issue issue with forcing the spec into SQL, there are some fancy big data programs out there that make these non-issues. That equipment isn't on the playground for most of us, but that makes the Canvas Alpha format in SQL, aligning to CD and the API most appealing. I'm sure more solutions and ideas will come out and the implementation will improve.

However, the concept of using LE in conjunction with GraphQL sounds appealing. I haven't spent much time with GraphQL yet, we should do a little hack chat and compare notes. :smileygrin:

jsavage2 · ‎07-25-2019

That doesn't seem to really be the whole story, though.

There are two more pieces to the puzzle: 1) those IDs (used for both Caliper and Alpha events) are constructed in the way that Canvas is *already* constructing the ids for the Data Portal and 2) GraphQL is using the same helper functions as the data portal, which is undocumented, but probably by design...

So while a Caliper "actor" may be a url, if the actor is a user, Canvas always sends a useful canvas ID through its com.instructure.cavas namespace extensions in the format of shardId+000000000000+user_id as extensions.com.instructure.canvas.id/urn:instructure:canvas:user, etc.

If we look at assignment creation, for instance, Caliper's extensions.com.instructure.canvas.entity_id is Alpha's assignment_id (and is also used to construct the urn):

Caliper (Use Case 3 : An assignment was created - basic course assignment 😞

{
   ...
      "object": {
        "id": "urn:instructure:canvas:assignment:21070000000000371",
        "type": "AssignableDigitalResource",
        "name": "add_new_assignment_3",
        "description": "<p>test assignment</p>",
        "dateCreated": "2018-09-28T22:58:50.000Z",
        "extensions": {
          "com.instructure.canvas": {
            "lock_at": "2018-10-01T05:59:59.000Z",
            "entity_id": "21070000000000371"
          }
        },
    ...
}

Alpha (Payload Example : assignment_created 😞

{
 ...
    "body": {
       "assignment_id": "21070000000000565",
       ...
    }
 ...
}

This is also the BIGINT that will be assigned as 'assignment_fact'.'assignment_id' and 'assignment_dim'.'id' when those records are created 48 hrs. later.

So far, so good.

Now here's the bit that that, although undocumented, seems to be true: The GraphQL Loader:: functions uses the same helper functions as the Data Portal to parse and create identifiers, and the upshot is that while the documentation says GraphQL queries will accept either the new node id format or the "legacy" object id format, they will *also* accept the Alpha/Caliper/Data Portal ids...at least at the moment.

This is a live example from my instance. Note that it returns the new node "id:" and the legacy "_id:", but accepts the raw CD/Alpha/Caliper shard+000000000000+id format as input:

(actually this just came through while I was typing: apparently agschmid‌ has found that you *must* use the CD/LE identifiers as the legacy _id in some cases: https://community.canvaslms.com/message/152021-problems-with-using-graphiql-code-in-insomnia?et=watc...)

robotcars · ‎07-25-2019

Must not always hit everything with a hammer.

There are probably dozens of ways to use Live Events and a combination of any of the API, GraphQL and Canvas Data.

I'm consuming them in real time into SQL, and LED is set to delete the events after processing, but the events don't have to be deleted. In fact, it's possible and in my roadmap to be able to pass events to different queues and sub routines. Currently, if I receive an event type I wasn't expecting it is passed to a second queue. They could also be saved and reprocessed by another application and cleaned up later. They could be passed to separate processes depending on the special use of a single event type. Any combination works depending on the over needs of the institution and what they can work out.

ie. they can be re-routed to separate queues and applications, and destinations like sql, s3, nosql.

@jsavage1 , makes a good suggestion for trying this without SQL. I generally shy away from getting data from the API because we have too many records to make it worth while, but GraphQL solves a lot of those issues. Since not all the values are available in real time for a combination of Caliper events. I'm wondering what the workflow would look like to solve this with the available Live Events, GraphQL, and the API.

Subscribing to and polling 2 event types, assignment_created and assignment_updated
Does the GraphQL schema account for all the data points we need?
- with assignment_id we'd need to query: assignment, course/status, enrollments, user enrollment/status
Do you know if GraphQL scan end data to the API or would we use the REST API to interact with the users calendar?

jack0x539 · ‎07-26-2019

More responses that I was expected after just a few days, thank you! So, what I can't really do is mix the LE creations with REST objects until CD catches up 48 hours later; this is a shame.

Yes, the Canvas Calendar will already contain the assignments, but we were experimenting by putting those assignments into our own timetables calendars - currently using REST for this, but it's really just too slow. We were considering a cache, probably refreshing nightly, but then it would have been nice to top that up with LE data, which we can't it seems!

*EDIT* I hadn't realised that the CD / LE IDs were in the format shard+000000000000+id, where id is addressable from the REST API, this would help me!

Is there currently a GraphAPI available?

jack0x539 · ‎07-26-2019

Just found https://canvas.beta.instructure.com/doc/api/file.graphql.html, so I'll take a look.

jsavage2 · ‎07-26-2019

The docs are useful, but since it's a work in progress they don't seem to be complete (e.g.: no mention of the longer CD/LE identifiers in queries.)

In my experience the best thing at the moment is to fire up https://{institution}.test.instructure.com/graphiql (I'm forever forgetting to type the "i") and just poke around.

Live Events: Suffering with IDs

Admin

Administrator

You're signed out

Live Events: Suffering with IDs

Admin

Administrator

Community help

View our top guides and resources: