Your Community is getting an upgrade!
Read about our partnership with Higher Logic and how we will build the next generation of the Instructure Community.
Could people post sample SQL they are using and short description of what it pulls?
Solved! Go to Solution.
Hi Matt, I realize this is an old post but I just came across it and wanted to see if I could help, if you haven't already solved it. When trying to match Redshift to the Canvas Analytics numbers, have you tried limiting your data by the workflow_state for both the item that you are trying to count (assignments, discussions, etc) and for the courses in the account?
By selecting data from course_dim with a workflow_state of either 'active' or 'completed' and then finding the right combinations of the workflow_state's of the other dimension tables, I have some queries that seem to work for me. I have only tested this on a few accounts, but when I did my query results matched the numbers returned in a Canvas Analytics API call.
My queries are below. I have union-ed them here to simplify testing, they just need the canvas id's for a term and an account. It should return number of Courses, Subaccounts, Students, Teachers, Discussion Topics, and Assignments in an Account.
Hope this helps. Let me know if you have questions.
------------------------------------------------------------------------------------------
SELECT 'courses' AS item, COUNT(DISTINCT id)
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID])
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available', 'completed')
UNION ALL
SELECT 'subaccounts' AS item, COUNT(DISTINCT id)
FROM account_dim
WHERE parent_account_id = [ACCOUNT CANVAS ID]
UNION ALL
SELECT CASE WHEN d.type LIKE 'StudentEnrollment' THEN 'students' WHEN d.type LIKE 'TeacherEnrollment' THEN 'teachers' END AS item, COUNT(DISTINCT user_id)
FROM enrollment_dim AS d JOIN enrollment_fact AS f ON (d.id = f.enrollment_id)
WHERE type IN ('StudentEnrollment','TeacherEnrollment')
AND workflow_state IN ('active','completed')
AND f.course_id IN
( SELECT DISTINCT id
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID] )
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available','completed')
)
GROUP BY d.type
UNION ALL
SELECT 'discussion_topics' AS item, COUNT(DISTINCT d.id)
FROM discussion_topic_dim AS d JOIN discussion_topic_fact AS f ON (d.id = f.discussion_topic_id)
WHERE workflow_state <> 'deleted'
AND f.course_id IN
( SELECT DISTINCT id
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID] )
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available','completed'))
UNION ALL
SELECT 'assignments' AS item, count(distinct d.id)
FROM assignment_dim AS d JOIN assignment_fact AS f ON (d.id = f.assignment_id)
WHERE workflow_state <> 'deleted'
AND f.course_id IN
( SELECT DISTINCT id
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID] )
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available','completed'));
Yes, this would be helpful. I still can't get my queries to match the Canvas Analytics pages and I haven't been able to determine if this is a problem with Canvas Analytics, Canvas Redshift, our Institutional data, or simply my own queries.
OK. Let's start putting SQL in this thread and then we can comment as needed to get accepted code.
Hi Matt, I realize this is an old post but I just came across it and wanted to see if I could help, if you haven't already solved it. When trying to match Redshift to the Canvas Analytics numbers, have you tried limiting your data by the workflow_state for both the item that you are trying to count (assignments, discussions, etc) and for the courses in the account?
By selecting data from course_dim with a workflow_state of either 'active' or 'completed' and then finding the right combinations of the workflow_state's of the other dimension tables, I have some queries that seem to work for me. I have only tested this on a few accounts, but when I did my query results matched the numbers returned in a Canvas Analytics API call.
My queries are below. I have union-ed them here to simplify testing, they just need the canvas id's for a term and an account. It should return number of Courses, Subaccounts, Students, Teachers, Discussion Topics, and Assignments in an Account.
Hope this helps. Let me know if you have questions.
------------------------------------------------------------------------------------------
SELECT 'courses' AS item, COUNT(DISTINCT id)
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID])
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available', 'completed')
UNION ALL
SELECT 'subaccounts' AS item, COUNT(DISTINCT id)
FROM account_dim
WHERE parent_account_id = [ACCOUNT CANVAS ID]
UNION ALL
SELECT CASE WHEN d.type LIKE 'StudentEnrollment' THEN 'students' WHEN d.type LIKE 'TeacherEnrollment' THEN 'teachers' END AS item, COUNT(DISTINCT user_id)
FROM enrollment_dim AS d JOIN enrollment_fact AS f ON (d.id = f.enrollment_id)
WHERE type IN ('StudentEnrollment','TeacherEnrollment')
AND workflow_state IN ('active','completed')
AND f.course_id IN
( SELECT DISTINCT id
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID] )
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available','completed')
)
GROUP BY d.type
UNION ALL
SELECT 'discussion_topics' AS item, COUNT(DISTINCT d.id)
FROM discussion_topic_dim AS d JOIN discussion_topic_fact AS f ON (d.id = f.discussion_topic_id)
WHERE workflow_state <> 'deleted'
AND f.course_id IN
( SELECT DISTINCT id
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID] )
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available','completed'))
UNION ALL
SELECT 'assignments' AS item, count(distinct d.id)
FROM assignment_dim AS d JOIN assignment_fact AS f ON (d.id = f.assignment_id)
WHERE workflow_state <> 'deleted'
AND f.course_id IN
( SELECT DISTINCT id
FROM course_dim
WHERE enrollment_term_id IN (SELECT id FROM enrollment_term_dim WHERE canvas_id = [TERM CANVAS ID] )
AND account_id IN (SELECT id FROM account_dim WHERE canvas_id = [ACCOUNT CANVAS ID] OR parent_account_id = [ACCOUNT CANVAS ID] )
AND workflow_state IN ('available','completed'));
Thanks for the query. I have filtered on workflow, but maybe not on multiple fronts like you have. I'll give this a try.
Meghan, this works great unless there are cross-listed courses in play. I just posted a query on the thread Matt asked this question that takes cross listing into account: Re: Redshift vs. Canvas Analytics
To interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign InTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign In
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.