Hi @mk4261
That definitely seems weird. It sounds like you were getting files from someone else, and I wonder if these were filtered or out of date. You said you also got them directly out of the Canvas Data Portal LTI?
These files should be updated daily, with a 2 day delay in what you see vs what's currently in Canvas.
It's kinda hard to tell what you're missing, but maybe I'll share a way to help you test if you are...
The dim and fact table for each set should have the same row count.
SELECT count(*) FROM discussion_entry_dim;
SELECT count(*) FROM discussion_entry_fact;
If the fact table is missing rows, the INNER JOIN will only show you rows where the JOIN works.
In your query above, if you change it to a LEFT JOIN, do you get a different result count.
SELECT count(*) FROM discussion_entry_dim
INNER JOIN discussion_entry_fact ON discussion_entry_dim.id = discussion_entry_fact.discussion_entry_id
Are you missing rows?
Beyond that, I don't see anything wrong with your query. If you have access to some other tables, I did a little tinkering to try and make reviewing/comparing what you see in Canvas and the DB a little easier. This query will return a discussion topic, using the discussion id you would see in Canvas, or the URL. The results trimmed and sorted to show a threaded view, if the discussion is threaded, you can switch them out if not.
SELECT
cd.canvas_id AS canvas_course_id
, cd.name AS course_name
, dtd.canvas_id AS topic_canvas_id
, dtd.discussion_type
, dtd.title AS topic
, dtd.[message] AS topic_body
, ded.canvas_id AS discussion_entry_canvas_id
, parent_discussion_entry_id
, ded.depth
, ded.[message] AS discussion_entry
, def.message_length AS discussion_entry_length
, dtd.workflow_state AS discussion_entry_state
, ded.created_at AS discussion_entry_created_at
, ded.updated_at AS discussion_entry_updated_at
, CASE
WHEN parent_discussion_entry_id IS NOT NULL THEN parent_discussion_entry_id % 10000000000000
ELSE ded.canvas_id
END AS thread_view
FROM discussion_entry_dim ded
JOIN discussion_entry_fact def ON def.discussion_entry_id = ded.id
JOIN discussion_topic_dim dtd ON dtd.id = def.topic_id
JOIN course_dim cd ON cd.id = dtd.course_id
WHERE dtd.canvas_id = 123456
ORDER BY thread_view, depth, ded.created_at
I tested this by looking for for discussion entries from the current summer term and grabbing one of the topic ids
SELECT
def.course_id
, def.topic_id
, count(ded.id) entries
FROM discussion_entry_dim ded
JOIN discussion_entry_fact def ON def.discussion_entry_id = ded.id
JOIN discussion_topic_dim dtd ON dtd.id = def.topic_id
WHERE def.enrollment_term_id = 100000000000001
GROUP BY def.course_id, topic_id
I grabbed one with 53 replies and opened it in Canvas. Then in the browser's Developer Tools, I took a count of the current entries. I got 62.
document.querySelectorAll('.discussion_entry')
Lot of Canvas toys... so I went further.
We also collect Live Events, discussion_entry_created in SQL.
So I counted that and found 63, maybe someone deleted one, can't see it in CD yet.
SELECT count(*)
FROM live_discussion_entry_created
WHERE discussion_topic_id = 1233456
SELECT CONVERT(date, created_at), COUNT(*)
FROM live_discussion_entry_created
WHERE discussion_topic_id = 1233456
GROUP BY CONVERT(date, created_at)
This lines up with the 2 day delay in CD... we have 10 entries since yesterday. Tonight I should see the 3 from yesterday, and so on.
Not sure if any of that'll help, let me know.
cc: Canvas Developers, Data and Analytics
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.