Hi jack0x539
I'm still learning these concepts, but here's my understanding.
Canvas Data has high latency, but is highly accurate. It's old when we get it, but is generally correct.
Live Events has low latency, but is less accurate. It's inaccuracy exists in what events are available and what you subscribe to. There's also the possibility that events aren't published to the queue if the code failed to complete/send the event.
Together, the two form what is called Lambda architecture - Wikipedia
We have a batch layer (Canvas Data) an a speed layer (Live Events)
Neither solution solves all problems.
Our schools and teachers have an LTI we wrote using Canvas Data, and it has notices on each page saying the data is old. We have an attendance screen in that LTI they liked, but had to wait until Monday to take attendance for Friday. Using Live Events I was able to combine it with Canvas Data, and show real time current user activity and submissions by appending rows from Live Events to Canvas Data, typically by creating queries on both sets, setting up distinct columns and then using a UNION. Specifically, submission_dim along with LE's submission_created and submission_updated to have any submission counts for the student including what they've submitted today.
The drawback, is that now my entire data workflow has to include more than just submissions and CD. I have highly accurate submissions rolled up in submission_dim with an attempts column that shows me X attempts were made, but the timestamp is based on the last submission. In Live Events we have every attempt, I don't want every attempt though (for these purposes), I want the last timestamp and the submission_id, so I can link it to the assignment id, and then show that this student submitted that assignment 10 minutes ago.
The precision of your data choices will depend on your use case, here are some considerations:
- An attendance LTI that shows teachers real time student attendance
- An SIS integration that includes hourly batch differentials
- Courses, enrollments, and assignments created in Canvas in real time
To do this without a 24+ hour delay we will need to account for the following:
Courses, Assignments, Submissions, Users, and Enrollments from Canvas Data
The same from Live Events, for a given time frame of events we store. I'm leaning toward 7 days. The overlap helps bridge issues in the latency gap. I'm still waffling about how long to hold them but in general I'd say cover the latency gap in CD and account for fault tolerance if CD didn't import for a couple days, like if it were to fail to import over the weekend and I'm not in the office.
This means if student who was enrolled in the course today, or an assignment was created today and published, that I also have to include the LE data for the user, enrollment, or assignment that doesn't exist in CD.
This means I stop joining LE to it's CD equivalent and I write an SQL view that combines CD user_dim with LE user_created, CD enrollments with LE enrollment_created+enrollment_updated (reduced to order over by most recent workflow_status), and CD assignment_dim with LE assignment_created. Now, I have views that merge batch with streaming for accurate and low latency purposes.
The use case you describe as an assignment is created and published, the student gets visibility of the assignment, the student then gets an update to their calendar with the assignment due date? Isn't this a default in Canvas?
Since the assignment event is sent with the instructor info, or at a minimum has nothing to do with the student. We have to take the assignment, it's status, the course, it's status, the course's enrollments, and users. Combine LE with CD where necessary.
Here's the problem...
You are using the Caliper format, currently this makes your use case difficult, possibly impossible, because there is no user_created/updated events in the Caliper events. There also isn't a course_updated, but there is course_created. I believe Canvas is working on adding new events for Caliper, but they aren't available yet. I prefer the Canvas Alpha (or Raw) format, more events more CD/API compatible. Live Events: Event Type by Format
This fall there are changes coming that should allow us to subscribe to either, both, or a mix of Caliper and Alpha formats.
Hypothetically, there are other ways of doing this, possibly by just using LE. But the ingestion process would likely look more like an UPSERT, updating records that exist and inserting new rows; versus storing all events and working on the last status. There are even more ways of ingesting and combining CD and LE, but that's not the topic.
The Caliper IDs and the nesting are pretty awkward, but I wrote a couple functions that reduce the nested and verbose field names to underscore_notation and remove the extensions|com|instructure|canvas for LED.
_flatten and _squish ledbelly/ims_caliper.rb at master · ccsd/ledbelly · GitHub
While I don't use Caliper, LED can consume them, and I'm open to un-complicating some of it's specs and aligning them with the Alpha format, CD, and the API. I mention that here, LED Known Issues
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.