The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December.
Read our blog post for more info about this change.
Found this content helpful? Log in or sign up to leave a like!
I think I'm running into an API bug and am hoping you can confirm. I know the URLs won't work for most of you, but hopefully you can recognize something I'm missing.
I am calling the Submissions API to return all the submissions for a particular quiz in a particular course. FWIW, there are 6300 students in this course.
The Course (982096)
https://usflearn.instructure.com/courses/982096
The Assignment : Quiz 5 (3156530)
The student, Adam Wetsch (ID 3867214), his submission is clearly a score of 100, I can click it and see his submission to the quiz.
However, the API is not including his submission in the results. I am iterating over the 600-ish page calls, and I do not see his submissions included in the list. I see more than 6000 other submissions, but his (and a few others) are not inlcuded in that list.
Here is the API call I am making
https://usflearn.instructure.com/api/v1/courses/982096/assignments/3156530/submissions
If I use the API to load his submission directly by UserID, it loads fine.
https://usflearn.instructure.com/api/v1/courses/982096/assignments/3156530/submissions/3867214
I've put in a ticket with support to see if they can replicate what I'm seeing.
Has anyone else used Submissions and found that some records are missing?
Thanks, Glen
You might want to append a ?per_page=50 to the end of your GET statements like this. That will cut the number of API calls by a factor of 5 since the default is 10.
I do use the list course assignment submissions on our mandatory orientation (which has had several thousand students at a time in it, currently with only 1585 since we just reset the version for summer/fall enrollment). However, my approach is a little different. Once they've completed the assignment I'm checking, I delete them from the course. Since we run the process every 20 minutes and quiz submissions don't show up until they've completed the quiz or had a grade assigned, there's rarely a more than a handful of students who are returned with the call. Still, everytime that someone has called to complain that they're still in the orientation and not able to get into their courses, it's been something other than the programming.
I also know that you could get 1000 people who say it's never happened to them, but that's not a proof. All it takes is one time where it doesn't work. You might be the counter-example that proves it broken.
So, let's talk debugging steps. You've probably already checked all these things, but just in case one of them makes you go ahah!, I thought I'd ask.
When you pull up Adam's score by specifying his ID on the API call, is there anything out of the ordinary when compared with other students who do show up in the big list?
Are you sure the code to pull all the assignments is really pulling all of the assignments? Are you using the links header to get the next page or are you manually trying to generate the links? Are you processing the information as it comes in or gathering it all into one body for processing at the end?
Have you tried dumping the URIs that are called to look for a missing or out-of-sequence call? If there is one, that might indicate where the problem is and even if it's on your end or Canvases?
Is it possible that new students are getting added to the course while you're fetching the assignment list? Yes, I'm grasping, but I'm wondering if another student got inserted, if it might skip over someone. I don't program in Ruby, but I know a guy who does who said he's had issues assigning unique id's when two people hit it at the same time but on multiple threads. I think the logic just needs fixed, but I'm trying to eliminate all possibilities, no matter how far fetched they may be.
Is there a regular pattern to the ones that are missing? Does changing the ?per_page parameter affect which ones are missing?
Are you looking for Adam in the raw dump from the API calls or are you doing some processing first and then looking for him? Can you identify where Adam should fall in the progression and then see if there is something in the previous entry that is causing an issue (I honestly have no idea what that would be - again, grasping, throwing out things and hoping something sticks)?
Is Adam a super-hacker? Does he have a role other than student? Does he have a role other than student somewhere else in Canvas?
I'm afraid this isn't much help. This is one of those things where nothing obvious jumps out at me and I use the same API calls.
I have had issues with the API not returning information, but that was with the quiz submissions, not the assignment submissions. That was an issue of it only returning one submission while the documentation indicated it should return all of them. I think that's a different issue than what you're facing.
HI James,
Thanks for the thoughts. I double checked my stuff and tried a few, but to no avail.
I changed the per_page to 50, and that loaded much faster overall. Good to know in general, thanks for that. But my mystery student still want included in the results.
I am printing the URI's and they appear to be in sequence. With 121 URI's, it'll be tedious to verify them all manually...
FWIW, these are quiz submissions. This student in particular has only the single submission for this quiz. I'll keep digging, and I'll update my ticket with Canvas.
I think I'm hitting a bug here with Submissions.
There are 6028 records to be returned by the API.
If I make the following calls via API, all three return 6028 records total.
https://usflearn.instructure.com/api/v1/courses/982096/quizzes/1073663/submissions?per_page=100
https://usflearn.instructure.com/api/v1/courses/982096/quizzes/1073663/submissions?per_page=33
https://usflearn.instructure.com/api/v1/courses/982096/quizzes/1073663/submissions?per_page=25
I extract the Submission ID, student ID, and score from each submission record and emit that to a file.
When I sort the files and look for duplicate rows, ( sort FILE | uniq -d ), the per_page=100 file has 0 duplicates, the per_page=33 files has 66 duplicates, and the per_page=25 files has 75 duplicate rows.
For the life of me it appears Canvas is repeating one or more pages. But that's not it. The records are scattered throughout the output stream.
Does this make sense to anybody else?
The numbers you gave are interesting, because each pair adds up to 100 (ok, 33+66=99). That highly suggests a definite programming mistake somewhere. Whether it's in Canvas or in Ruby remains to be seen.
The first thing that comes to mind is a rounding error in math related to inexact representation of decimals in a binary system, there are various functions used for rounding: round, trunc, int, floor, ceil
The links are sent as page=? and per_page=?. Since you're not calculating the specific rows to return, the mistake is unlikely to be on your end.
So, let's say that page=10 and per_page=33, then it should the pages should be rows 1-33, 34-66, 67-99, so that by the time you get to 10, you have rows 298-330.
But maybe, just maybe, the computer is off slightly in the representation, so it's not getting 298, it's getting 297-329 and repeating an entry, and Adam turns out to be #330, which doesn't get called.
Normally, integer calculations aren't prone to the same type of mistakes as decimals, but who knows? You might try running a page_size of 64 and see if it returns 0 duplicates or 36? I picked 64 since it's the largest power of 2 less than the per_page limit of 100 and can be represented exactly in binary.
Anyway, if this is what's going on, then it sounds like something on Canvas' side.
I guess we could look at the source code to Canvas and see if there is anything obvious. But even if you find the error, you still have to have them fix unless you're running your own instance.
In your dump of the JSON data, can you determine the how the data is ordered when it arrives from the API call? Is it by user_id, submission_id, ???
The reason I ask is because I found a note in the PostgreSQL documentation that talks about how specifying an ORDER on the query is absolutely critical when doing pagination and that if you don't you might get different results. They say it's not a bug, it's a consequence of not using an ORDER clause on the SQL statement.
I'm trying to find where in the Canvas source code the lookups are, but I'm not a Ruby programmer and haven't really done MVC programming either, so I'm not making much progress. It might be quicker for you to find the order than it is for me to find the relevant code.
If it turns out that there is no order to the query, that might explain why it's messing up. You'd still have to have Canvas fix it. If there is a definite order, then we look somewhere else.
@glparker , I looked an API request of 36 quiz submissions from my class and was unable to discern any sort order. I looked at the user_id, the submission_id, the submitted/graded datetimes, and anything else that had unique values. So we may be onto something here. The PostgresSQL documentation said that you need to specify an ORDER clause to get consistent results with different OFFSET and LIMIT statements. I cannot guarantee that will fix the issue, but it sounds like a place for them to start looking.
I've been told that the sort order is always the order in which submissions were made. That corresponds with the submission object's "id" attribute. Grading a submission doesn't change the order they are returned in the API.
@auerbach ,
I believe was told that at one point, too, and it would seem reasonable, but the evidence speaks louder than the words of someone. Unfortunately, there are a lot of people in the world speaking about things of which they do not know.
To illustrate, I ran the list of now 33 submissions for one of the assignments in my class.
I reindexed the submission and user ids to provide anonymity so I don't trigger a FERPA violation. I saved the entire download into one array, iterated through it, saved all the submission ids and user ids, sorted each list numerically from lowest to highest, and then used the position in the list in the report below. Besides anonymizing, it provides a much easier way to see if they are in order.
The first column is the reindexed Submission ID, the second column the reindexed User ID, the third column is the Workflow State, and the last column, when present, is the Submitted At timestamp.
The order in the list below is the order they were returned via the API call.
As you can see, there is no easily discernible order.
| Submission | User | WorkflowState | Submitted_at |
|---|---|---|---|
| 24 | 22 | unsubmitted | |
| 25 | 14 | unsubmitted | |
| 26 | 17 | unsubmitted | |
| 23 | 24 | unsubmitted | |
| 20 | 12 | unsubmitted | |
| 18 | 5 | unsubmitted | |
| 17 | 2 | unsubmitted | |
| 16 | 15 | unsubmitted | |
| 13 | 1 | unsubmitted | |
| 12 | 25 | unsubmitted | |
| 10 | 4 | unsubmitted | |
| 27 | 9 | submitted | 2015-05-04T20:31:55Z |
| 2 | 20 | unsubmitted | |
| 22 | 13 | submitted | 2015-05-07T20:36:59Z |
| 9 | 26 | submitted | 2015-05-05T21:04:36Z |
| 14 | 21 | unsubmitted | |
| 15 | 11 | submitted | 2015-05-08T01:13:46Z |
| 11 | 27 | submitted | 2015-05-07T21:38:30Z |
| 7 | 8 | submitted | 2015-05-06T16:28:03Z |
| 8 | 29 | submitted | 2015-05-09T02:04:48Z |
| 19 | 30 | submitted | 2015-05-08T15:43:10Z |
| 21 | 28 | submitted | 2015-05-06T15:53:57Z |
| 5 | 18 | unsubmitted | |
| 6 | 7 | unsubmitted | |
| 3 | 23 | submitted | 2015-05-08T23:16:55Z |
| 1 | 16 | submitted | 2015-05-08T04:52:02Z |
| 4 | 6 | unsubmitted | |
| 28 | 33 | unsubmitted | |
| 29 | 10 | unsubmitted | |
| 30 | 3 | unsubmitted | |
| 31 | 32 | unsubmitted | |
| 32 | 31 | unsubmitted | |
| 33 | 19 | unsubmitted |
Hello,
Recently, I'm having the same trouble here. I wonder how/why this still happens.
Does someone has solved it?
Are you having the same issue with submissions or with some other API call not returning all of the information?
It's related to not specifying an order in the data returned and when it's not specified, PostgreSQL is free to return the data in whatever order it wants to. That's not a problem when it fits on one page, but it is sometimes a problem pagination is involved.
In addition to the postings above, there was a discussion about this in a related matter (see the bottom) How is the To-Do List sorted?
Somewhere, I'm having trouble finding it right now, I think Glen said he just specified a particular per_page and it worked so he stopped worrying about it. It may have been in person when I saw him at InstructureCon15. I don't remember which per_page he picked, but the real solution is for Canvas to make sure that all of the information is returned.
Well, I've developed an external app that uses the API to get all submissions from a specific quiz.
But, unfortunately I cant get it right. I'm worried how my app is going to work this way.
Just to be clear -- by all submissions, do you mean that there are students whose submissions do not show up through the API or that there are students who completed multiple attempts and you're not getting all of those attempts?
If it's the first kind, which is what Glen was experiencing where some were sporadically and seemingly unpredictably not getting returned, then you might file a trouble ticket with Canvas and refer to this discussion and the other one I mentioned. Maybe they never fixed it.
If it's a kind where you get a bunch and then it stops, it might be timeout issues. I don't know what software you're using, but Google Sheets has a five minute time out. In another issue that came up, Chrome seems to apply a timeout while Firefox doesn't. This shouldn't be an issue if you're running a server-based command line software like PERL, Python, PHP, etc. If you're accessing the results through a web-browser, there might be an issue of it timing out before it's done. For instance, in PHP, there is a max_execution_time parameter that defaults to 30 seconds when running from within a web server (for example, Apache) but 0 (no timeout) if running from the command line.
If you're only getting 10 (or whatever the per_page setting is), then it's a pagination issue.
I mention those other issues just in the off-chance that we're missing something. I'd hate for you to submit a trouble ticket and find out it's something that we could have helped with. At this point, I don't know enough other than if it's the same issue that Glen was seeing, I'd file a trouble ticket. If it's anything else, it might have a simpler fix that didn't involve contacting Canvas support.
I need the API to verify which students have sent the submission, comparing to a list of students previously made. The API is used to process its data through PHP Google AppEngine service and it seems to be unstable because I have multiple quizzes and most of them work good. I checked the "per_page" attribute and it doesn't influence the result.
Maybe, opening a ticket is the best thing. I'll be back if a solution comes.
Thanks for replying!
Hi @glparker ,
I am going through having a look at some of the early days in Canvas Developers and checking in to see if older enquiries have been answered.
I am wondering, were you ever able to find out the cause of your issue, I am hoping I can assume that it is well and truly resolved by now, but if not, please let us know.
I am going to mark this as assumed answered for the time being, however by all means please let us know if you still have an outstanding issue and we can have another look!
Cheers,
Stuart
Hi @stuart_ryan ,
I'm having the same problem as Glen did. There's more to it, though. When I compare the Student Analysis Report for a quiz with the data I get from calling the submissons API for the same quiz, I don't get data for all the students, for those students I do get data for, only their first submissions data is present.
Changing the per_page value has an effect but doesn't solve the problem. When per_page is set to 200, I get the last 69 records ordered by student id, ignoring any subsequent submissions that may have been made by the student; when the value is 100, the same result. When the value is 50, the last 19 records are returned. When the value is 25, the same 19 are returned. When the value is 20, 9 records are returned and same again when the per_page value is set to 10. This is for a quiz allowing 2 attempts with 193 submissions.
I may be missing something but the results seem to suggest there's something amiss with the Submissions API?
The submissions API can be a little difficult to understand, but I think it's operating correctly if you understand what it's doing.
If you want submission data for additional attempts, trying adding the query parameter include[]=submission_history. As a bonus, when you do this for a quiz it will also return the responses given by the students, although you have some work to do to make it usable.
Don't set a per_page=200, it most cases, including the submissions API, it only supports 100 at a time.
If you keep getting the same values returned over and over, it sounds like what happens when you have a bookmark for the page in the URL. This happens with the list multiple submissions endpoint. That bookmark specifies where to start and the per_page tells it how many to take starting at that point.
For example, when I started with
/api/v1/courses/2610710/students/submissions?student_ids[]=all&assignment_ids[]=23468640&per_page=5
I got a next link header that had a query parameter of
assignment_ids[]=23468640&student_ids[]=all&page=bookmark:WzI5NTI4NDM3OF0&per_page=5
If I load that, I get 5 submissions, starting with submission_id=295284374 and ending with submission_id=295284378. That is, it's the second 5 results. If I change the per_page and make it 10, and fetch the data, then I get 10 submissions, but still starting with submission_id=295284374, containing the first five submissions from before (including the 295284378), and then five more that I didn't have before.
If you're on position 51, then you would always get the same values subject to a limitation of either the per_page or the 19 that remain. If your per_page is >= 19, then you get the last 19 submissions. If your per_page < 19, then you get the first per_page results, but starting with position 51.
If you're trying to use page= and per_page= to get all of the results, you should be careful. I regularly take advantage of that in a lot of my code, but I always check the link headers (either manually when writing the code or programmatically as part of the fetch). If Canvas doesn't supply a page= in their next header, then I do not attempt to use those because it's not supported for that API call. It may look like it works -- for example, I can set page=2&per_page=5 in my call and get the same 5 results I described before. However, if you're doing this and another submission comes in while you're fetching the results, the results are not predictable. That's why there's a bookmark for this kind of data -- to make sure that you get all of the data.
Thank you, James, and apologies for the delayed response. I've been away for a few days. I got some great clues from your post (this is a learning exercise for me as much as a problem to solve). I understand now that the Submissions API is acting correctly. I am actually using your canvasAPI() function, originally from your Course Due Dates Google sheet. When I check the Link headers, I get exactly what you suggested:
"Link":"<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=1&per_page=100>; rel=\"current\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=2&per_page=100>; rel=\"next\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=1&per_page=100>; rel=\"first\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=3&per_page=100>; rel=\"last\"","status":"200 OK"}
"Link":"<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=2&per_page=100>; rel=\"current\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=3&per_page=100>; rel=\"next\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=1&per_page=100>; rel=\"prev\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=1&per_page=100>; rel=\"first\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=3&per_page=100>; rel=\"last\"","status":"200 OK"}
"Link":"<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=3&per_page=100>; rel=\"current\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=2&per_page=100>; rel=\"prev\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=1&per_page=100>; rel=\"first\",<https://rmit.instructure.com/api/v1/courses/20447/quizzes/73499/submissions?page=3&per_page=100>; rel=\"last\"","status":"200 OK"}
So I needed to look elsewhere. Then I remembered that I had trouble with the object returned by the canvasAPI() function. When calling the Submissions API, the canvasAPI() function returns an object of the form: {"quiz_submissions":[{},{},{}]} instead of [{},{},{}]} so I was fixing this after using your canvasAPI() function. But what if this unexpected form of the array creates a problem within the canvasAPI() function itself? Sure enough, when looking through it I could see where it was falling out because the returned Submissions object was a single value/key pair. So, I used:
if (json.hasOwnProperty('quiz_submissions')) {
json = json.quiz_submissions;
}
to capture the condition, amend the object to the expected form and continue on. Now I'm getting all the data.
Thanks for your help and your scripts!
Cheers
Ric
The Google Sheets API was never completed; it was for the functions that I was using which used the page and per_page, but it wasn't robust. Other people took early versions and used it and shared it, so some people who think they are using my version are really using a derivative version. Since you got it directly from the course due dates, at least you're working with an original, but it was written early in my learning how things worked and if I was to start again, I would probably make some changes. I'm very good at starting things and then getting distracted before finishing them ![]()
I found the cause of this problem.
If I do the pagination manually (i.e. ignore the Link header), then I have to be very careful about the `per_page` parameter because Canvas has an upper limit of this parameter for each API that supports pagination, but the upper limits vary among the API's. If you set the `per_page` beyond the limit, Canvas will lower it to the upper limit and it does not tell you.
For example, when I call the API `GET /api/v1/accounts/1/courses?page=1&page_size=200`, Canvas will lower the page size to 50, but I don't know such a fact, so the next call would be `GET /api/v1/accounts/1/courses?page=2&page_size=200`. It should respond with the 51st course through the 100th, but somehow it considers to return the 201st course through 250th, so in this way, the 51st through the 200th courses will never be fetched.
The solution is simple, just set the `page_size` to a ridiculously large number (e.g. 9999), and rely on the `Link` header to do the right thing. This header is a little bit hard to parse. Here's a Ruby method to get the URL for the next page:
```
def next_link(link_header)
link_header =~ /<([^>]+)>;\s*rel="next"(?:,|$)/ && $1
end
```
It sounds like I am having the same problem as @glparker .
I am following the directions for Assignment Submissions Report Programming that @James helped with because for Financial Aid we are getting similar requests. It has been super helpful. In testing, I used a student who had a few classes to make it easier for me to identify everything. When I tested it with a student who had a full load, I noticed I was only getting some of the submissions from one class in particular. I am using the "List submissions for multiple assignments, and it is skipping 43 submissions for assignments.
I think the same is true for the "List Assignments" API endpoint. They are missing here too.
"https://xxxx.instructure.com/api/v1/courses/course_id/assignments?per_page=100"
However, this is where it gets weird. The "Get user-in-a-course-level assignment data" API endpoint returns the assignments that are not returned from the first two API calls, and it is throwing my script a ton of errors. I worked something in to run properly, but it does not match up any assignment or submission data for the report.
"https://xxxx.instructure.com/api/v1/courses/course_id/analytics/users/user_id/assignments"
I am not skilled in pagination, and that might be where my trouble lies. I was hoping someone could help me out here. We are using PowerShell for our SIS integration method, and for handling data it is very helpful. Just wanted to throw that out there so you know how I am scripting.
Thanks!
You might try duplicating what Canvas does when they load the gradebook. These are the submissions that show up in the gradebook. That doesn't include ungraded discussions or non-graded assignments, which might show up in the course participation data but not the submission data.
They also fetch more data than you might need if you're just looking for activity dates for financial aid information. I'll include all of it just in case your needs have grown.
Note that not all of the steps listed may apply to your situation. Some of the API calls have undocumented parameters that Canvas uses. If a per_page is not specified, it is typically defaulted to 10.
Step 6 is probably the most relevant one. This is in a class with 2 students + 1 test student. For larger classes, it fetched 10 students at a time in step 6. If you're only fetching one student, then you would need to use the one student ID.
@James that solved my problem!
After writing my original post I wrote a function to determine the number of pages for a call so I can paginate correctly. So, that will be helpful. Your suggestion for the Assignment_Groups API helped because I do not have to paginate, it will give me all the assignment submission information in one call, so I know that I am getting everything.
For the Submissions API, all I had to do based on your suggestion was to pass the grouped parameter that would group all the student's submissions like the assignments group (grouped=1). This removed the need to paginate and mess with the bookmark which I was at a loss.
I also did not realize I could pass an exclude response field parameter. That may come in handy in the future.
Thanks for all your help!
Using a bookmark in a link is not an issue when you follow the recommendation on pagination (use the URL in the Next link header). It's when people try to speed up the process and make multiple calls by fetching in parallel by creating their own URLs that they run into trouble.
This next paragraph speaks in generalizations, don't hold me to exactness on this.
The bookmark is still not completely safe from missing data, though. What it does is contain the point where the request last left off so it knows where to pick up with the next call. For an easier to understand example of bookmarks, let's say you're fetching users and you leave off with "Smith, John" and the next time you're going to pick up with "Smith, Kevin" and then "Thomas, Bob." In the middle of fetching data, a new user gets added named "Doe, Jane". She will not get picked up in your data fetch because you've already fetched through the Smiths. With the numbered pagination (page=2, page=3), if you were sorting by last name (not a given) and inserted Jane, then (potentially) John would be pushed back into Kevin's spot and you would get John twice. If a record was deleted, then Kevin would move into John's spot and Bob into Kevin's spot and when you fetched the next record, you would get John and Bob, but Kevin would be left out. I don't think it sorted by last name, but that's the kind of thing that could happen that was supposed to be addressed by bookmarks. Canvas also used it to slow things down for enrollments (people were abusing the system by making too many calls concurrently and the enrollment endpoint was an expensive one).
Bookmarks are supposed to be "opaque", but when I looked at them for another call a few weeks ago, it was actually a base64 encoding of a JSON string. The one I was looking at could be deciphered to get "Smith, John" ... so not so opaque. That means that you could still "cheat" the system by starting parallel requests using bookmarks --- one that ended at the each letter of the alphabet ("AZZZZZ" would start at B's). Fetch smaller page sizes and then abort the requests when you reached the next letter. I just came up with that last one, I haven't actually tested it, but if you look at the bookmarks and the code, it may give you an idea of what can be done.
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in