Hi @mk4261
That definitely seems weird. It sounds like you were getting files from someone else, and I wonder if these were filtered or out of date. You said you also got them directly out of the Canvas Data Portal LTI?
These files should be updated daily, with a 2 day delay in what you see vs what's currently in Canvas.
It's kinda hard to tell what you're missing, but maybe I'll share a way to help you test if you are...
The dim and fact table for each set should have the same row count.
SELECT count(*) FROM discussion_entry_dim;SELECT count(*) FROM discussion_entry_fact;
If the fact table is missing rows, the INNER JOIN will only show you rows where the JOIN works.
In your query above, if you change it to a LEFT JOIN, do you get a different result count.
-- should return the same row count as the 2 aboveSELECT count(*) FROM discussion_entry_dim INNER JOIN discussion_entry_fact ON discussion_entry_dim.id = discussion_entry_fact.discussion_entry_id
Are you missing rows?
Beyond that, I don't see anything wrong with your query. If you have access to some other tables, I did a little tinkering to try and make reviewing/comparing what you see in Canvas and the DB a little easier. This query will return a discussion topic, using the discussion id you would see in Canvas, or the URL. The results trimmed and sorted to show a threaded view, if the discussion is threaded, you can switch them out if not.
SELECT cd.canvas_id AS canvas_course_id , cd.name AS course_name , dtd.canvas_id AS topic_canvas_id , dtd.discussion_type , dtd.title AS topic , dtd.[message] AS topic_body , ded.canvas_id AS discussion_entry_canvas_id , parent_discussion_entry_id , ded.depth , ded.[message] AS discussion_entry -- if this is null, could be an attachment , def.message_length AS discussion_entry_length , dtd.workflow_state AS discussion_entry_state , ded.created_at AS discussion_entry_created_at , ded.updated_at AS discussion_entry_updated_at , CASE WHEN parent_discussion_entry_id IS NOT NULL THEN parent_discussion_entry_id % 10000000000000 ELSE ded.canvas_id END AS thread_view FROM discussion_entry_dim ded JOIN discussion_entry_fact def ON def.discussion_entry_id = ded.id JOIN discussion_topic_dim dtd ON dtd.id = def.topic_id JOIN course_dim cd ON cd.id = dtd.course_idWHERE dtd.canvas_id = 123456 -- short topic id seen in canvas urls -- AND ded.workflow_state = 'active'ORDER BY thread_view, depth, ded.created_at-- ORDER BY ded.created_at
I tested this by looking for for discussion entries from the current summer term and grabbing one of the topic ids
SELECT def.course_id , def.topic_id , count(ded.id) entriesFROM discussion_entry_dim ded JOIN discussion_entry_fact def ON def.discussion_entry_id = ded.id JOIN discussion_topic_dim dtd ON dtd.id = def.topic_idWHERE def.enrollment_term_id = 100000000000001GROUP BY def.course_id, topic_id
I grabbed one with 53 replies and opened it in Canvas. Then in the browser's Developer Tools, I took a count of the current entries. I got 62.
document.querySelectorAll('.discussion_entry')
Lot of Canvas toys... so I went further.
We also collect Live Events, discussion_entry_created in SQL.
So I counted that and found 63, maybe someone deleted one, can't see it in CD yet.
SELECT count(*)FROM live_discussion_entry_createdWHERE discussion_topic_id = 1233456-- 63SELECT CONVERT(date, created_at), COUNT(*)FROM live_discussion_entry_createdWHERE discussion_topic_id = 1233456GROUP BY CONVERT(date, created_at)-- 2020-06-05 2-- 2020-06-06 7-- 2020-06-07 1-- 2020-06-08 21-- 2020-06-09 13-- 2020-06-10 9-- 2020-06-11 3-- 2020-06-12 7
This lines up with the 2 day delay in CD... we have 10 entries since yesterday. Tonight I should see the 3 from yesterday, and so on.
Not sure if any of that'll help, let me know.
cc: Canvas Developers, Data and Analytics
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.