Yea me either. But in trying to consolidate, join and otherwise integrate batch and stream data Canvas has provided with Canvas Data and Live Events, I found it had a name, found it was an architecture popular about 8 years ago before products like Apache Spark, Flink, Hadoop were all designed to sync, store, and distribute these data types into a more congruous data sets.
I highly doubt 90% of anyone around here are interested in getting that far with Apache products yet...
what are some of the best ways to handle this with simple SQL?
Ideally, I need to create tables that can be populated from Canvas Data that can be merged with Live Events, leaving me with a combination of real time and batch data for our users. You talk about using Live Events to update your submissions.... great use case, but what if we go further. What if my application is attendance, and I also need to merge in Enrollments and Users.
How long should we keep the data? Currently I have to manually delete live events for each event table, but I also have a live_stream table, consisting of all common fields to 1 table for all events, which kind of does better than the requests table for things I want with less noise. So I can truncate all the individual tables (where there's a comparable canvas data table) each night after a successful canvas data import...
now, should I do a query on the last date time from the CD table, or just truncate down to midnight the previous day, have some overlap, but resolve that in my queries...
Lot's a possibilities here, they expand the more use cases you have for adding more live events... what are the best practices.
Consider submissions
submission_dim
submission_created (created, submitted)
submission_updated (updated events, LTI event post backs from vendor)
Live Events submission_create|updated, basically holds each submission and attempt, while submission_dim holds the last submitted_at and a rollup of attempts.
If I was doing this, what's best?
Write a program that UPSERTS submission_dim or
Partitions, Orders, and Selects the last submitted_at from LE and appends to submission_dim via UNION?
I have a dashboard with real time active users chart... to produce this, I have to use Live Events user_created, along with user_dim and pseudonym_dim, so I group by SIS group (employee, student, parent). I can extend this into the attendance application with enrollments and teachers can see new enrollments, both in Canvas People, and our LTI.
Maybe this is really easy for some DBAs... for some of us application developers, we're used to making our own data and having direct access to records being created and stored.
There's a few ways to do it, what experiences have people had, tried, what's easier to maintain?