The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December. Read our blog post for more info about this change.
I'm trying to reconcile two different data sources for the Canvas numbers: Redshift and the Canvas Analytics page. I looked at the Canvas Analytics for a sub-account (department in our case) and term (Winter 2014).
The number of Discussion Topics on the Analytics page for this dept/term is 479.
Using Redshift the numbers I'm able to get are Discussion Topics 31 and distinct count of Messages is 635.
Here is the query I've pulled out of Tableau, gives the same results when running it through Workbench:
SELECT enrollment_term_dim.name AS none_name__enrollment_term_dim__nk,
enrollment_term_dim.sis_source_id AS none_sis_source_id__enrollment_term_dim__nk,
account_dim.subaccount1 AS none_subaccount1_nk,
COUNT(1) AS cnt_number_of_records_ok,
COUNT(DISTINCT discussion_topic_dim.id) AS ctd_id__discussion_topic_dim__ok
FROM public.discussion_entry_dim discussion_entry_dim
INNER JOIN public.discussion_entry_fact discussion_entry_fact ON (discussion_entry_dim.id = discussion_entry_fact.discussion_entry_id)
INNER JOIN public.discussion_topic_dim discussion_topic_dim ON (discussion_entry_fact.topic_id = discussion_topic_dim.id)
INNER JOIN public.discussion_topic_fact discussion_topic_fact ON (discussion_topic_dim.id = discussion_topic_fact.discussion_topic_id)
INNER JOIN public.course_dim course_dim ON (discussion_entry_fact.course_id = course_dim.id)
INNER JOIN public.enrollment_term_dim enrollment_term_dim ON (course_dim.enrollment_term_id = enrollment_term_dim.id)
INNER JOIN public.account_dim account_dim ON (course_dim.account_id = account_dim.id)
WHERE ((enrollment_term_dim.name = 'Winter Quarter 2014') AND (discussion_topic_dim.workflow_state = 'active'))
GROUP BY 1,
2,
3
Note: there is no change in the data if discussion_topic_dim.workflow_state = 'active' is removed, this also seams odd
Matt, I am looking into this. Which institution are you with?
Eastern Washington University
Any information about this issue. Have you replicated it? Is there something I can do on my end?
What have you found out about this issue? Is there something wrong with the Redshift service? Something wrong with EWU data? Or is there something wrong with my query?
Are you still troubleshooting?
Please let me know where we are on this issue.
The previous numbers I submitted for the Canvas Analytics counts should be 1,220 for Discussion Topics and actually represents a College and not a department as does the SQL query.
There seems to be an issue with the Redshift data. I'm still looking into it, but for reference this is defect DS-142
Any updates on DS-142? It is hard to continue to test when I already know the Redshift data is suspect. Will you be checking against the rest of the Analytics data too? Or do you know if this is specific to the data that I initially pulled out?
Matt,
We fixed the main issues with DS-142 and they should be reflected in the query results that you get now.
You will notice that they still do not quite line up with the Analytics screen. This is because the analytics screen is taking course sections into account. In other words if a discussion thread is in a given course, then it is counted for each section of that course on the analytics screen. Your queries count the thread once for the main course it is in and that is all.
We are adding the necessary foreign keys needed to execute a query similar to what the Analytics screen is displaying.
Yes we will validate that we can replicate what is seen on the Analytics page with the correct queries in hosted data.
Excellent. Thank you.
Deactivated user now that I got Tableau functional again, I'm recreating my discussion count. I'm still not able to make the numbers match. I know you said they might not match perfect, but it is still off by more than 250 for a quarter. I simplified the query, removing the discussion entry data and am just left the discussion topic counts. I'm filtering on the workflow variable 'active'. With the Redshift data coming out of Tableau I get 11,446 results, the Canvas Analytics data shows 11,181.
Is there another variable I need to account for? Does the Canvas Analytics remove published but un used discussions from its counts?
Below is the query out of Tableau:
"SELECT ""enrollment_term_dim"".""name"" AS ""name__enrollment_term_dim_"",
""enrollment_term_dim"".""sis_source_id"" AS ""sis_source_id__enrollment_term_dim_"",
SUM(1) AS ""sum_number_of_records_ok""
FROM ""public"".""discussion_topic_fact"" ""discussion_topic_fact""
INNER JOIN ""public"".""discussion_topic_dim"" ""discussion_topic_dim"" ON (""discussion_topic_fact"".""discussion_topic_id"" = ""discussion_topic_dim"".""id"")
INNER JOIN ""public"".""course_dim"" ""course_dim"" ON (""discussion_topic_fact"".""course_id"" = ""course_dim"".""id"")
INNER JOIN ""public"".""account_dim"" ""account_dim"" ON (""course_dim"".""account_id"" = ""account_dim"".""id"")
INNER JOIN ""public"".""enrollment_term_dim"" ""enrollment_term_dim"" ON (""course_dim"".""enrollment_term_id"" = ""enrollment_term_dim"".""id"")
WHERE ((((""enrollment_term_dim"".""name"" >= 'Fall Quarter 2013') AND (""enrollment_term_dim"".""name"" <= 'Fall Semester 2015')) OR ((""enrollment_term_dim"".""name"" >= 'Spring Quarter 2013') AND (""enrollment_term_dim"".""name"" <= 'Summer Quarter 2015')) OR ((""enrollment_term_dim"".""name"" >= 'Winter Quarter 2013') AND (""enrollment_term_dim"".""name"" <= 'Winter Quarter 2016'))) AND (""discussion_topic_dim"".""workflow_state"" = 'active'))
GROUP BY 1,
2"
Deactivated user can you tell me why my count is still off by so many? Can you share with me your query you used?
thanks
Deactivated user is this no longer the best way to communicate with you? I need to better understand how the data coming out of Redshift compares to the data coming out of Analytics.
Sorry all, I was out of the office last week.
I've been looking into this since yesterday afternoon. I have my suspicions this is also related to cross-listed courses, but I want to be sure. I will keep working on it, but I have some InstructureCon responsibilities I need to attend to over the next few days, so answers might be slow coming.
Thanks you for your patience.
Matt
I am also interested in this answer - I am looking at page_views count. The results from Redshift doesn't match the Analytics API for the page_views value, nor what I see on the Course Analytics page in Canvas. Should I count a different thing, or add more filters to get the expected results?
Redshift query:
select u.canvas_id as "user id",
c.canvas_id as "course id",
count(r.id)
from requests r
inner join user_dim u on r.user_id = u.id
inner join course_dim c on r.course_id = c.id
inner join enrollment_fact ef on r.user_id = ef.user_id and r.course_id = ef.course_id
inner join enrollment_dim ed on ef.enrollment_id = ed.id
where c.canvas_id in ('565')
and ed.type = 'StudentEnrollment'
and ed.workflow_state <> 'deleted'
group by u.canvas_id, c.canvas_id
order by u.canvas_id
compared to Analytics API: https://umich.instructure.com//api/v1/courses/565/analytics/student_summaries
Thanks.
Page view counts will be different than requests. Requests will be every request that was made to the web tier. There can be multiple requests per page view. We are looking into a secondary table that might roll up requests into page views. It is not currently scheduled for v1.
Ok, I think I finally have an answer for this. Sorry to take so long.
The issue is with cross listed courses. The Canvas Analytics page for a subaccount rolls the stats for all the courses in a sub account including the cross listed courses. So, to give an example, say there is a course taught by the Math department (MATH 306) which is also crosslisted by the Computer Science department as CS 306 and the Physics department as PHYS 315. The queries above would only count metrics for MATH 306 in the Math sub account and its parents, but the Canvas Analytics page would also include the metrics in the other departments. So, concretely, the Canvas Analytics page for the Physics subaccount would include data from MATH 306 in its rollups as with the page for the Computer Science account and, of course, the Math account.
The key to getting a query in Canvas Data that matches the data on the Canvas Analytics page is getting the group of courses to match. This query replicates the logic used to determine which courses are associated with a subaccount in Canvas Analytics:
select
distinct cx.id
from account_dim a
inner join course_dim c on c.account_id = a.id
inner join course_section_dim cs on cs.nonxlist_course_id = c.id
inner join course_dim cx on cx.id = cs.course_id
where a.subaccount1_id = [subaccount_id] and
cx.enrollment_term_id = [enrollment_term_id] and
(cx.workflow_state = 'available' or cx.workflow_state = 'completed')
union
select distinct c.id
from account_dim a
inner join course_dim c on c.account_id = a.id
where a.subaccount1_id = [subaccount_id] and
c.enrollment_term_id = [enrollment_term_id] and
(c.workflow_state = 'available' or c.workflow_state = 'completed' )
So to go back to the original question about discussion topics, here is a query that will give the number of discussion topics for Winter Quarter 2014 for the College of Arts Letters and Education. This results of this query seem to match what is on the Canvas Analytics page.
select
count(distinct discussion_topic_id)
from discussion_topic_fact dtf
inner join discussion_topic_dim dtd on dtd.id = dtf.discussion_topic_id
where workflow_state != 'deleted' and course_id in
(
select distinct cx.id
from account_dim a
inner join course_dim c on c.account_id = a.id
inner join course_section_dim cs on cs.nonxlist_course_id = c.id
inner join course_dim cx on cx.id = cs.course_id
where a.subaccount1_id = 100000000102895 and
cx.enrollment_term_id = 100000000003841 and
(cx.workflow_state = 'available' or cx.workflow_state = 'completed')
union
select distinct c.id
from account_dim a
inner join course_dim c on c.account_id = a.id
where a.subaccount1_id = 100000000102895 and
c.enrollment_term_id = 100000000003841 and
(c.workflow_state = 'available' or c.workflow_state = 'completed' )
);
Here is a more generalized query that also gets the other counts on the Canvas Analytics page. This query is meant mostly to be instructive and is not necessarily a best practice. It would probably perform better if it were broken in to several smaller queries. However, this query should match the logic in the Canvas Analytics subaccount page. I did some spot checking and things seem to line up, however, I would be interested to know if anyone finds any discrepancies.
select
et.name,
a.subaccount1,
count(distinct c.id) as course_count,
count(distinct enroll_student.user_id) as enrolled_user_count,
count(distinct enroll_teach.user_id) as enrolled_teacher_count,
count(distinct dtd.id) discussion_topic_count,
count(distinct assn.id) assignment_count
from (
select distinct
a.id as account_id,
cx.id as course_id
from account_dim a
inner join course_dim c on c.account_id = a.id
inner join course_section_dim cs on cs.nonxlist_course_id = c.id
inner join course_dim cx on cx.id = cs.course_id
union
select distinct
a.id as account_id,
c.id as course_id
from account_dim a
inner join course_dim c on c.account_id = a.id
) acct_course
inner join account_dim a on a.id = acct_course.account_id
inner join course_dim c on c.id = acct_course.course_id
inner join enrollment_term_dim et on et.id = c.enrollment_term_id
left outer join (
select ef.user_id, ef.course_id
from enrollment_dim ed
inner join enrollment_fact ef on ef.enrollment_id = ed.id and (ed.workflow_state='active' or ed.workflow_state='completed') and ed.type='TeacherEnrollment'
) enroll_teach on enroll_teach.course_id=acct_course.course_id
left outer join (
select ef.user_id, ef.course_id
from enrollment_dim ed
inner join enrollment_fact ef on ef.enrollment_id = ed.id and (ed.workflow_state='active' or ed.workflow_state='completed') and ed.type='StudentEnrollment'
) enroll_student on enroll_student.course_id=acct_course.course_id
left outer join assignment_dim assn on assn.course_id = acct_course.course_id and assn.workflow_state = 'published'
left outer join discussion_topic_fact dtf on dtf.course_id = acct_course.course_id
left outer join discussion_topic_dim dtd on dtd.id = dtf.discussion_topic_id and dtd.workflow_state != 'deleted'
where
(c.workflow_state = 'available' or c.workflow_state = 'completed' )
group by et.id, et.name,et.date_start,2
order by et.date_start,et.name,2
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.