@jwals @ZaidGandhi I'm also using the enrollments table for similar purposes.
I'm trialling an analytics table loosely based on scd; so each day I'm grabbing changes in enrolments total_activity_time (ie incremental activity in a course by a user) and storing it in a way that I can aggregate/analyse at a more granular level.
Great points from @jwals about interpreting the enrolment table.
Not even sure if it's useful or works as intended, yet, but basically I execute the following insert each day in databricks immediately after the cd2 incremental update completes. Because the table could get huge, there's a monthly aggregation script with a clean up of the daily data (could do more or less frequently, or partition eg USING DELTA PARTITIONED BY (year_month)). It's still an experiment at the moment.
CREATE TABLE IF NOT EXISTS canvas_analytics.user_activity_daily (etc...)
INSERT INTO canvas_analytics.user_activity_daily (
timestamp_column,
year_day,
year_week,
year_month,
user_id,
enrollment_id,
course_id,
course_section_id,
enrollment_type,
role_id,
total_activity_time,
incremental_activity_time,
workflow_state,
e_created_at,
e_updated_at,
e_start_at,
e_end_at,
e_completed_at
)
SELECT
current_timestamp() AS timestamp_column,
CONCAT(
EXTRACT(YEAR FROM current_date()),
'-',
LPAD(EXTRACT(DOY FROM current_date()), 3, '0')
) AS year_day,
CONCAT(
EXTRACT(YEAR FROM current_date()),
'-',
LPAD(EXTRACT(WEEK FROM current_date()), 2, '0')
) AS year_week,
CONCAT(
EXTRACT(YEAR FROM current_date()),
'-',
LPAD(EXTRACT(MONTH FROM current_date()), 2, '0')
) AS year_month,
e.user_id,
e.id AS enrollment_id,
e.course_id,
e.course_section_id,
e.type AS enrollment_type,
e.role_id,
COALESCE(e.total_activity_time, 0) AS total_activity_time,
CASE
WHEN ua.max_total_activity_time IS NULL THEN COALESCE(e.total_activity_time, 0)
ELSE COALESCE(e.total_activity_time, 0) - ua.max_total_activity_time
END AS incremental_activity_time,
e.workflow_state,
e.created_at AS e_created_at,
e.updated_at AS e_updated_at,
e.start_at AS e_start_at,
e.end_at AS e_end_at,
e.completed_at AS e_completed_at
FROM canvas.enrollments e
--JOIN canvas.pseudonyms p ON e.user_id = p.user_id
LEFT JOIN (
SELECT enrollment_id, MAX(total_activity_time) AS max_total_activity_time
FROM canvas_analytics.user_activity_daily
GROUP BY enrollment_id
) ua ON e.id = ua.enrollment_id
WHERE e.workflow_state = 'active'
AND (ua.max_total_activity_time IS NULL OR COALESCE(e.total_activity_time, 0) > ua.max_total_activity_time)
AND e.type NOT IN ('StudentViewEnrollment', 'DesignerEnrollment', 'ObserverEnrollment')