@Doug9
First, this sounds really inefficient. There's no need to make one API call per student per section, you can get an entire course worth of students at one time (pagination will be involved).
I never use the section endpoint for submissions since I want data for the whole course and a student may be enrolled in multiple sections (perhaps not at your institution, but Canvas allows for it).
I looked at the source code and it looks like course or section is used mostly to determine the list of students and assignments. The section endpoint would be a subset of the course. If you have assignments that are only assigned to specific sections then the list of assignments for a different section would not include that assignment, whereas the course level endpoint would return all assignments, even if the student wasn't assigned it.
If you specify a student id, then you're just going to get the assignments for that student, regardless of whether it was assigned to the section, the student, or the whole course.
Now, back to the really inefficient comment.
If you specify student_ids[]=all, then you can get all submissions for all students. If you leave off the assignment_ids[] parameter, you get it for all assignments. That's one API call, but you will have to use pagination -- even with a per_page=100 parameter. If the student is not assigned the assignment, there is no submission record for it.
Now, if you have lots of students and/or lots of assignments, then you will likely run into bookmark pagination. That can slow things down since you cannot make multiple requests. In that case, making an individual call for each student could be parallelized and end up being faster (maybe - untested).
However, realize that with that endpoint, you're getting extra information that you don't need. I did some testing while writing this response and the discussion replies get delivered as part of the response. If you have a lot of those with some prolific students, then that could add to the time to download. But how do you get rid of that?
GraphQL to the rescue.
You can get all of the submission information for a course with just the information you need and it's a lot faster. You likely won't even have to mess with pagination unless your course is so big that it takes more than 30 seconds.
query courseSubmissions($courseId: ID) {
course(id: $courseId) {
submissionsConnection {
nodes {
userId
assignmentId
missing
late
}
}
}
}
You would then specify a variables property to the request that has the courseId. For example, here is the variables object for a Canvas course ID of 123456.
{ "courseId": 123456 }
That took 200 ms and delivered 1.8 kb of payload for one class with 186 submissions. For a larger class with 3409 submissions, it took 2.3 s with a 10.7 kb payload.
For comparison, in that larger class, grabbing all submissions for just one student (the call you're making) took me 3.1 s with a payload of 220.5 kb. And that's just the first 100 assignments. I had more than 100 assignments in that class. The second request, which had to wait until the first one was returned because it uses bookmarks, took 1.2 s and added another 89.8 kb.
We're looking at 4.3 s and 310.3 kb for one student. And sure, you can use a throttling library to make multiple requests, but you have to allow space between them or you exceed the allowed limit and your requests die. The heavier the request -- this seemed pretty heavy since it took 3.1 s -- the fewer requests you can make simultaneously. You're probably looking at no more than 4 students per second and that might be pushing it.
I had about 16 students in that class with 3409 submissions, so at 4 per second, that's a 4 seconds to make the requests. 3 for each of them to come back with the initial 100 submissions for the student. That takes me out to 7 seconds before the final request is made. Then another second for the second page.
8 seconds with a lot of pagination to get the entire class when I fetch them one at a time vs 2.3 s for the entire class with one request when I use GraphQL.
You will probably want to make a couple of other GraphQL requests -- one to get the list of assignments in a course and one to get the list of students in a course. Then you can cross reference those to the userId and assignmentId delivered in the submissions payload.
I could have added it to the request itself. That's part of the beauty of GraphQL. But it's redundant since it includes the user information and assignment information that you request for each assignment. Depending on the size of the class, it's not that much extra. My large class took 2.6 s and delivered 20.4 kb (vs 2.3 s and 10.7 kb) when I requested some extra information about the student and assignment.
query courseSubmissions($courseId: ID) {
course(id: $courseId) {
submissionsConnection {
nodes {
userId
assignmentId
missing
late
assignment {
name
}
user {
sortableName
}
}
}
}
}
Why did I show it both ways? I'm asking for the bare minimum. You probably want due dates and other assignment information and that could make separate requests more efficient, especially if it's expensive (resource-wise) to fetch. I could send all three requests (submissions, assignments, users) in parallel to speed up the process.
If you're not familiar with GraphQL, you can take /graphiql onto the end of your dashboard URL to play around. If you can get it to work (not all information is available), it can really speed up the time to get the results and reduce the amount of information retrieved.