Community Help

NickChevalierUT · ‎03-28-2024

Confession: I am not a developer or an SQL expert, but I can view certain kinds of code, sort of learn it, and make adjustments as needed. Otherwise, I rely heavily on ChatGPT, particularly when it comes to SQL. 😁

That said, I was wondering if someone more experienced with SQL than I might be able to please look over the SQL query down below that our team has come up with and let me know if we’re on the right track?

We are trying to verify that this query accurately reports on:

Student last name
Student first name
Student SIS user ID
Term name
Term start date
SIS course ID
Course title
Teacher name of the course in question
Submission type (from the latest submission in the course in question)
Submission date/time (from the latest submission in the course in question)
Assignment title (from the latest submission in the course in question)
Enrollment status of the student in question in the course in question

WITH LatestSubmissions AS (
    SELECT
        SPLIT_PART(u.name, ' ', 2) AS last_name, 
        SPLIT_PART(u.name, ' ', 1) AS first_name,
        p.sis_user_id AS user_id,
        et.name AS term_name,
        TO_CHAR(et.start_at - INTERVAL '4 hours', 'MM-DD-YYYY HH:MI:SS AM "ET"') AS term_start_date, -- Subtract 4 hours and format as requested
        c.course_code,
        c.name AS course_name,
        MAX(teachers.teacher_name) AS teacher_name, -- Aggregate the teacher's name
        MAX(s.submission_type) AS submission_type,
        TO_CHAR(s.submitted_at - INTERVAL '4 hours', 'MM-DD-YYYY HH:MI:SS AM "ET"') AS submitted_at, -- Subtract 4 hours and format as requested
        MAX(a.title) AS assignment_title,
        e.workflow_state AS course_enrollment_status, -- Rename the column
        ROW_NUMBER() OVER (PARTITION BY u.id, c.id ORDER BY COALESCE(s.submitted_at, '0001-01-01'::timestamp) DESC) AS row_num
    FROM
        public.users u
    JOIN
        public.enrollments e ON u.id = e.user_id
    JOIN
        public.courses c ON e.course_id = c.id
    JOIN
        public.enrollment_terms et ON c.enrollment_term_id = et.id
    JOIN
        (SELECT enroll.course_id, users.name AS teacher_name
         FROM public.enrollments AS enroll
         JOIN public.users AS users ON enroll.user_id = users.id
         WHERE enroll.type = 'TeacherEnrollment') AS teachers ON c.id = teachers.course_id
    JOIN
        public.pseudonyms p ON u.id = p.user_id
    JOIN
        public.submissions s ON e.course_id = s.course_id AND u.id = s.user_id
    JOIN
        public.assignments a ON s.assignment_id = a.id
    WHERE
        e.type = 'StudentEnrollment'
        AND et.sis_source_id = 'SP_2024'
        AND s.submission_type IS NOT NULL -- Exclude null submission types
        AND s.submitted_at IS NOT NULL -- Exclude null submission times
    GROUP BY
        u.id, c.id, p.sis_user_id, et.name, et.start_at, s.submitted_at, e.workflow_state -- Include enrollment status in the GROUP BY clause
)
SELECT
    last_name,
    first_name,
    user_id,
    term_name,
    term_start_date,
    course_code,
    course_name,
    teacher_name,
    submission_type,
    submitted_at,
    assignment_title,
    course_enrollment_status -- Include enrollment status with the new name
FROM
    LatestSubmissions
WHERE
    row_num = 1
ORDER BY
    last_name, course_code ASC;

Please let me know if any other information is needed.

Thank you!

kyle_cole · ‎03-28-2024

Taking a quick look at this, it does look like you will achieve what you are looking for. When you run it does it produce what you were expecting?

NickChevalierUT · ‎03-29-2024

It does produce what we're expecting, but I just wanted to be extra sure since our Financial Aid department will be using this report as a metric to determine student course activity in the event of a withdrawal. Thank you for taking a look, Kyle!

James · ‎04-01-2024

@NickChevalierUT

If you're wanting to use this for Financial aid reporting (anything that has significant consequences), be careful. I recently tried to recreate a local database used in the SIS imports using the data in Canvas Data 2 (CD2). It turned out there were missing items in Canvas Data 2. I had about 350 users who had completed an assignment (I know this because they had to complete the assignment to get out of the orientation course and into their regular classes) but CD2 had no record of them. Then I tried to get a list of email addresses and it turns out that I have none in CD2, even though querying the API shows I do.

Now granted, most of the missing data seems to be from a long time ago (my Canvas account was created in 2012 and most of the people missing the orientation assignment we from 2014). It seemed to do a better job with more recent information.

There were a couple of other places that I had to download the provisional reports and re-create the information from those. I wanted CD2 since my tables had created_at and updated_at columns and those don't come in the provisioning reports.

stimme · ‎03-28-2024

Hi @NickChevalierUT ! This an interesting query and reporting goal. I've noted a some thoughts.

One way to verify that you'll get what you want in the final result is to run the custom table query LatestSubmissions on its own and inspect the results.
The assignments and submissions joins aren't quite right. Assignments should be joined to courses on c.id = assignment.context_id. Then Submissions joined on s.assignment_id = a.id AND s.user_id = u.id (you do not need to join submissions to courses, because the assignment record is unique to a course).
Do you intend to exclude enrollment records where a student has no online submissions? That is the effect of "s.submitted_at IS NOT NULL", as well as doing an inner join on the assignments table for courses which contain no assignments. If you want to guarantee you'll have one record per student-in-course, your subquery for assignments and submissions should be separated from the subquery for enrollments & names of courses, users, & terms.
For student name fields, you will probably have better results if you split the users.sortable_name column on the comma.
Canvas does not require that course name or course code be unique. That could affect your result.
The "teachers" subsubquery might be improved by adding a WHERE condition on enroll.workflow_state.
The way students' enrollment workflow_state is included could be confusing. It's possible for a student to have both active and deleted enrollments in one course (whether connected to different student roles or different sections). I think the WHERE conditions should include either e.workflow_state = 'active' or e.workflow_state != 'deleted' (the latter is a little less strict).
I'm not sure about the ROW_NUMBER() OVER function; no experience with it.

saustin · ‎03-29-2024

@stimme We just wanted to thank you for these amazing recommendations -- I'm working together with @NickChevalierUT on this beautiful report he created, and as much as we love his SQL, we wanted more eyes on it to be absolutely sure. We've taken the items you've listed as actual tasks to check. As you are probably aware, follow-up work is slow, and we are still working on it. There's radio silence here because it takes time to check. That said, when we're done, we'll follow up here, but we did want to let you know how much we appreciate this response. You not only explained what, you explained why. This is just invaluable.

Thank you again.

--Sharon Austin

NickChevalierUT · ‎04-01-2024

Yes, @stimme, thank you for the valuable feedback! As Sharon mentioned, we're working on those tweaks and will let you know how it goes.

KeithSmith_au · ‎04-01-2024

There are other technical issues with the fact that lots of those tables are not just one-one relationships. The query may work for the particular data, but is not guaranteed by any means.

Any given user can have multiple pseudonyms (these are login records). Even if the institution only supports a single record, then at the very least include a where to select an active workflow state. As it stands, if you ever change the sis_id for a user, you will get multiple records for that user.
You are obviously aware that there can be multiple teacher enrolments in a course, including multiple enrolments for the same teacher if they are added to multiple sections in the course. Using MAX to aggregate on the name is not a reliable or safe way to get a single teacher name. At the very least, the workflow state of teacher enrolments should be restricted to those that are active. If a single name (only) is required, then selecting either the oldest or newest active teacher enrolment would be more reliable and consistent.
As you have the student enrolment state in the group, and students may have multiple enrolments in a course (if in multiple sections, or moved between sections), you can get multiple records for the same student. As you are looking to use this in the event of withdrawal, excluding deleted (or other non-active) enrolment states would not be an option.
The method of aggregating multiple submissions for the same user in the same course, and the join to assignments could well result in incorrect reporting of submission type against submitted at:
- Any given course can have multiple different assignments, and the student may have made submissions in each
- This query arbitrarily uses MAX to get a single submission_type across all the submissions joined in. This value will be returned in all the resulting rows - but may not be the submission_type for that row.
- The submitted_at date is returned for each submission that matches
- Using MAX to aggregate the assignment title is to little real purpose, as the submission is tied to the assignment by id, and although there may be multiple submission rows, for the same assignment_id, they will always be associated to the same assignment record, and it will have the same value - but is required as you are not grouping by that value.

If the requirement here is to return the most recent submission in each course for each student (and return all students that were ever enrolled in a course in the given term), then this query is not going to achieve it or give accurate results for what is does return.

Rather than using aggregations arbitrarily, if you have sufficiently small data sets, you are better off qualifying your sub-selects to only return the rows you are after. For example, for teachers change the select to something like

(SELECT enroll.course_id, users.name AS teacher_name,
ROW_NUMBER() OVER (PARTITION BY enroll.course_id, users.name ORDER BY enroll.created_at DESC) AS row_num
FROM public.enrollments AS enroll
JOIN public.users AS users ON enroll.user_id = users.id
WHERE enroll.type = 'TeacherEnrollment' and enroll.workflow_state != 'deleted')
AS teachers ON c.id = teachers.course_id and teachers.row_num = 1

That would give you the most recently added teacher enrolment that was not deleted - but only one for each course. You should be left outer joining this, as some courses may not have a non-deleted teacher enrolment, or indeed any teacher enrolment at all.

Likewise, instead of the current approach to submissions, where you are aggregating, try something more along the lines of

(select submiss.submission_type AS submission_type,
submiss.submitted_at,
ROW_NUMBER() OVER (PARTITION BY submiss.user_id, submiss.course.id ORDER BY
COALESCE(s.submitted_at, '0001-01-01'::timestamp) DESC) AS row_num_sub
from
public.submissions submiss
where submiss.submission_type IS NOT NULL -- Exclude null submission types
AND susmiss.submitted_at IS NOT NULL) -- Exclude null submission times
as s ON e.course_id = s.course_id AND u.id = s.user_id and s.row_num = 1

This would allow a direct join of the assignment (left outer join as there may be no submission to start with) to get the correct assignment detail (this subquery should be left outer joined as well, to ensure you do not miss students with no submissions).

You would also need to filter down the enrolments similarly to get a single enrolment per student in a course (as you only need the user id it doesn't really matter which one, except that you are after the enrolment status, so you would need to order to get the "correct" first one. If all you want is active in preference to deleted, but don't mind which other status you get, it is simple, but if you want anything other than deleted in preference to deleted, the query gets much trickier).

saustin · ‎04-02-2024

@KeithSmith_au Your response, as in the case of others from this community, is absolutely invaluable to us. The nuts and bolts of getting not only results, but interpreting the results, is beyond gold.

To the point,

Any given user can have multiple pseudonyms (these are login records). Even if the institution only supports a single record, then at the very least include a where to select an active workflow state. As it stands, if you ever change the sis_id for a user, you will get multiple records for that user.

Yes, we want to preserve getting multiple records for a user. There are very rare instances in which we have multiple sis_id's for a user, usually given by accident, but for which we want a record of any transaction should the records not be merged yet. It is a safety net.

To the point,

You are obviously aware that there can be multiple teacher enrolments in a course, including multiple enrolments for the same teacher if they are added to multiple sections in the course. Using MAX to aggregate on the name is not a reliable or safe way to get a single teacher name. At the very least, the workflow state of teacher enrolments should be restricted to those that are active. If a single name (only) is required, then selecting either the oldest or newest active teacher enrolment would be more reliable and consistent.

We agree that MAX aggregate is not a reliable or safe way to get a teacher name. We will be removing that component of the code; thank you for the reality check and confirmation. That said, we want to include the multiple teachers, as often laboratories represent team-teaching. At this point, I am unsure too, if we should restrict to active state because when instructors must leave the institution for any reason, we want to know what was assigned to the students left behind. Sometimes, faculty do leave suddenly for health or other reasons.

In particular, this suggestion of yours is particularly invaluable as it relates to finding submissions

This would allow a direct join of the assignment (left outer join as there may be no submission to start with) to get the correct assignment detail (this subquery should be left outer joined as well, to ensure you do not miss students with no submissions).

You made so many great points, we cannot address them all here, but please know that we are taking each point you've made into consideration, and as in the case above with @stimme exact SQL recommendations, are incorporating them as revisions, and testing. The testing is so, so slow, as you must know. But you have to know how appreciative we are of the targeted responses. Thank you so, so much. We'll look forward to any additional insights you may have as we continue to revise and test.

KeithSmith_au · ‎04-02-2024

Yes, we want to preserve getting multiple records for a user. There are very rare instances in which we have multiple sis_id's for a user, usually given by accident, but for which we want a record of any transaction should the records not be merged yet. It is a safety net.

Depending on how you manage merging, you may get various results. If the issue is that an additional user is created by mistake (with a different sis_id) - i.e. treated as 2 different users, and then when the mistake is realised, the users are merged - you don't really want 2 records. If in this case, the erroneous user is made inactive in the Student Records system, and a merge is done in Canvas, only 1 actual user record exists - it just has 2 sis_ids attached (due to the merge). In this case, you need to select only the one that remains active.

We agree that MAX aggregate is not a reliable or safe way to get a teacher name. We will be removing that component of the code; thank you for the reality check and confirmation. That said, we want to include the multiple teachers, as often laboratories represent team-teaching. At this point, I am unsure too, if we should restrict to active state because when instructors must leave the institution for any reason, we want to know what was assigned to the students left behind. Sometimes, faculty do leave suddenly for health or other reasons.

What you should / could get will depend a little on the process that occurs when faculty leave - as to whether the enrolment is deleted, or just made inactive (removing access, but retaining history). We would never automatically delete enrolments (for staff or students) but simply make them inactive, preserving the history (even more important for students, as deleting an enrolment deletes all the submissions - forever).

If you want a single result with the names of all non-deleted teachers, it will depend on what SQL engine you are using - as those sorts or aggregation functions tend to be dialect specific. For Microsoft SQL server, left outer joining something like the following in place of your teacher selection will give you a comma separated list of all the teacher names for the course (assuming it fits with in 8,000 characters - otherwise you will need to convert to a nvarchar(max) to allow it to work.,

(SELECT enroll.course_id,
string_agg(users.name, ', ') AS teacher_name,
FROM public.enrollments AS enroll
JOIN public.users AS users ON enroll.user_id = users.id
WHERE enroll.type = 'TeacherEnrollment' and enroll.workflow_state != 'deleted'
group by enroll.course_id)
AS teachers ON c.id = teachers.course_id

saustin · ‎04-02-2024

@stimme Good afternoon, I was in the process of incorporating recommendations by this amazing community, and was looking a bit hard at the following that you recommended:

The way students' enrollment workflow_state is included could be confusing. It's possible for a student to have both active and deleted enrollments in one course (whether connected to different student roles or different sections). I think the WHERE conditions should include either e.workflow_state = 'active' or e.workflow_state != 'deleted' (the latter is a little less strict).

I have a question of ignorance. Users can have two roles in Canvas, something I was surprised to learn. In fact, I could be wrong -- and PLEASE correct me if I am wrong on this, but to be a member of a group, someone must have a role of student.

I certainly understand that it is possible for a student to have both active and deleted roles in one course, but when the student can have TWO roles in a course, for example, a teacher role AND a student role, and either of those roles can be in different workflows, do you think that the workflow state you are asking us to consider would "mask" either of the roles if someone were to have two roles in a course?

Here is an example of myself as both a teacher and a student in one course. -- and thank you again so, so much for all this incredible feedback you are providing.

stimme · ‎04-02-2024

@saustin You are correct that group memberships can only be created for users enrolled as Student (or some custom role based on Student). Teachers and TAs can see every course group's tools, homepage, etc. and may receive notifications for certain types of activity occurring in any course group (e.g. announcements).

The maximum number of enrollments one user can have in one course is the product of sections x roles in your Canvas instance. The e.type = 'StudentEnrollment' WHERE clause excludes the enrollments of other types (Teacher, TA, etc.) from the result set. The e.workflow_state = 'active' WHERE clause excludes enrollments with other states. Specifying the type and the workflow_state narrows the result set to focus on active student enrollments (which could exist for each section in the course).

The question about what non-student and non-active enrollments an active student has in a course couldn't be answer by the results of the query that excludes them. I was imagining that those questions would be answered by a different sort of query. Now I think I may not have understood the meaning of reporting the "enrollment status of the student in question in the course in question" in @NickChevalierUT 's original post. What does "enrollment status" mean across administrative units in your institution? Does it matter to them if a student has a second role in Canvas?

KeithSmith_au · ‎04-02-2024

@stimme wrote:

maximum number of enrollments one user can have in one course is the product of sections x roles in your Canvas instance. The e.type = 'StudentEnrollment' WHERE clause excludes the enrollments of other types (Teacher, TA, etc.) .

...

in @NickChevalierUT 's original post. What does "enrollment status" mean across administrative units in your institution? Does it matter to them if a student has a second role in Canvas?

This is correct. the e,type in the enrollment record is generic - and refers to the base type the actual role was based on - to my knowledge there are 6 possible values (documented in the Canvas Data 2 type for class enrollments__type - being

StudentEnrollment
ObserverEnrollment
StudentViewEnrollment
TaEnrollment
TeacherEnrollment
DesignerEnrollment

As there can be any number of custom roles created using those base types, then there can potentially be many "StudentEnrollment" records for a single user in a single section.

If I understand what your reporting is intended for, you are really only interested in the "state" of a particular person in a particular course overall - not the individual enrolments. You will probably have to assign a hierarchy of preference for workflow_state of the enroilments, and take the ""most appropriate" state.

Possible workflow_states for enrollments are (roughly in the order in which you probably want to consider them):

active
invited
creation_pending
completed
rejected
inactive
deleted

Some of these are a little vague - my understanding is that creation_pending is roughly equivalent to invited, but is used for student invitations when the course is unpublished.

To select the "top" from these, you cannot easily just sort - you would have to assign a hierarchy order using a case statement, then partition over the course / user combination, and order by your hierarchy, selecting the first applicable.

saustin · ‎04-02-2024

@James We definitely appreciate the second set of eyes your experience brings, and absolutely welcome any additional insights you may have from your experience. In a sense, we are lucky, because we have only come online with Canvas 2 Data. Our frustration is that, as we learn to navigate this database, much of the historical "how-to" applies to Canvas 1 Data.

Further, we have a T&L subscription, so we only have access to the Public Table, which may actually be a blessing in disguise, as we are supplementing anything we cannot get from the data base with API calls. This forces us to check, and cross-check, everything we do.

Even such small details as the lack of availability of emails in the database, we found discovered early, but obviously wondered if there were something else we should be doing. It is the confirmation from the community members such as yourself that give us the confidence that we should be using API calls to supplement the report.

Thank you so much for your input, and we always look forward to more!

KeithSmith_au · ‎04-03-2024

Further, we have a T&L subscription, so we only have access to the Public Table, which may actually be a blessing in disguise, as we are supplementing anything we cannot get from the data base with API calls. This forces us to check, and cross-check, everything we do.

Even such small details as the lack of availability of emails in the database,

I am not familiar with what a "T&L subscription" means, and what ""Public Table" infers - you obviously have access to many tables. I was under the impression that Canvas Data 2 when available gave access to all the tables. The only things that are not accessible are things that require versions (such as prior versions of submissions) that have not been made accessible via Canvas Data 2. Is this something your institution has put between the actual Canvas Data 2 and your access?

In that case, you will find the emails in the communication_channels table - linked to pseudonyms, Emails have a path_type value of ''email' and the actual email address is held in the path column.

saustin · ‎04-04-2024

Good evening to this amazing community. @stimme @KeithSmith_au @James @kyle_cole We have taken all your suggestions, and incorporated them, where possible. (It was not always possible, mainly due to time constraints to check them out fully).

I think we may not yet be capturing "nuance" that is possible with Canvas Data (especially as it applies to roles) but I do think the simple report we are generating here gives them (the administrators) what they are asking for. We understand that we need to work further with administrators to better understand and explain the data. (For example, we can get a last date of submission, but I found in spot-checking, that students may have made a submission without receiving a grade for it -- from a teacher's point of view, the submission may not have been "blessed".)

Our team cannot thank you enough for your input. SQL, the database, and understanding Canvas combine together to make the report. What this group brought forward to us was powerful in the extreme, and again, we can't thank you enough for it. Thank you, thank you, thank you.

Below is the latest summary, which our team will now check with API calls and individual course investigation. Again, thank you, and if you see red flags, please feel free to say something.

WITH EmailAddresses AS (

SELECT

p.sis_user_id AS p_sisuserid,

cc.path AS email_address

FROM

public.pseudonyms p

JOIN

public.communication_channels cc ON p.user_id = cc.user_id

WHERE

cc.path_type = 'email'

),

LatestSubmissions AS (

SELECT

SPLIT_PART(u.name, ' ', 2) AS last_name,

SPLIT_PART(u.name, ' ', 1) AS first_name,

p.sis_user_id AS user_id,

et.name AS term_name,

TO_CHAR(et.start_at - INTERVAL '4 hours', 'MM-DD-YYYY HH:MI:SS AM "ET"') AS term_start_date,

c.course_code,

c.name AS course_name,

teachers.teacher_name AS teacher_name,

s.submission_type AS submission_type,

TO_CHAR(s.submitted_at - INTERVAL '4 hours', 'MM-DD-YYYY HH:MI:SS AM "ET"') AS submitted_at,

a.title AS assignment_title,

e.workflow_state AS course_enrollment_status,

ROW_NUMBER() OVER (PARTITION BY u.id, c.id ORDER BY COALESCE(s.submitted_at, '0001-01-01'::timestamp) DESC) AS row_num

FROM

public.users u

JOIN

public.enrollments e ON u.id = e.user_id

JOIN

public.courses c ON e.course_id = c.id

JOIN

public.enrollment_terms et ON c.enrollment_term_id = et.id

JOIN

(SELECT enroll.course_id, users.name AS teacher_name

FROM public.enrollments AS enroll

JOIN public.users AS users ON enroll.user_id = users.id

WHERE enroll.type = 'TeacherEnrollment') AS teachers ON c.id = teachers.course_id

JOIN

public.pseudonyms p ON u.id = p.user_id

JOIN

public.assignments a ON c.id = a.context_id

JOIN

public.submissions s ON s.assignment_id = a.id AND s.user_id = u.id

WHERE

e.type = 'StudentEnrollment'

AND et.sis_source_id = 'SP_2024'

GROUP BY

u.id, c.id, p.sis_user_id, et.name, et.start_at, s.submitted_at, e.workflow_state, teacher_name, submission_type, assignment_title

)

SELECT

last_name,

first_name,

user_id,

ea.email_address,

term_name,

term_start_date,

course_code,

course_name,

teacher_name,

submission_type,

submitted_at,

assignment_title,

course_enrollment_status

FROM

LatestSubmissions

LEFT JOIN

EmailAddresses ea ON LatestSubmissions.user_id = ea.p_sisuserid

WHERE

row_num = 1

ORDER BY

last_name, course_code ASC;

Canvas Data 2 Query Review

Canvas

Canvas Data 2

Data

sql

Roll Call Attendance / Grades Alert

CD2 - how to identify which users / sections have ...

Course Audit log Published Dates from Database

Global Announcements

curiosity within Canvas Live Events - can you help...

basic MSSQL database configuration

CD2 - Assignment Override 'Noop' Set Type?

Roll Call Attendance / Grades Alert

DAP initdb error: expected UUID

Forwarding Canvas Activity Logs Externally

You're signed out

Canvas Data 2 Query Review

Community Help

View our top guides and resources: