I think I'm running into an API bug and am hoping you can confirm. I know the URLs won't work for most of you, but hopefully you can recognize something I'm missing.
I am calling the Submissions API to return all the submissions for a particular quiz in a particular course. FWIW, there are 6300 students in this course.
The Course (982096)
The Assignment : Quiz 5 (3156530)
The student, Adam Wetsch (ID 3867214), his submission is clearly a score of 100, I can click it and see his submission to the quiz.
However, the API is not including his submission in the results. I am iterating over the 600-ish page calls, and I do not see his submissions included in the list. I see more than 6000 other submissions, but his (and a few others) are not inlcuded in that list.
Here is the API call I am making
If I use the API to load his submission directly by UserID, it loads fine.
I've put in a ticket with support to see if they can replicate what I'm seeing.
Has anyone else used Submissions and found that some records are missing?
You might want to append a ?per_page=50 to the end of your GET statements like this. That will cut the number of API calls by a factor of 5 since the default is 10.
I do use the list course assignment submissions on our mandatory orientation (which has had several thousand students at a time in it, currently with only 1585 since we just reset the version for summer/fall enrollment). However, my approach is a little different. Once they've completed the assignment I'm checking, I delete them from the course. Since we run the process every 20 minutes and quiz submissions don't show up until they've completed the quiz or had a grade assigned, there's rarely a more than a handful of students who are returned with the call. Still, everytime that someone has called to complain that they're still in the orientation and not able to get into their courses, it's been something other than the programming.
I also know that you could get 1000 people who say it's never happened to them, but that's not a proof. All it takes is one time where it doesn't work. You might be the counter-example that proves it broken.
So, let's talk debugging steps. You've probably already checked all these things, but just in case one of them makes you go ahah!, I thought I'd ask.
When you pull up Adam's score by specifying his ID on the API call, is there anything out of the ordinary when compared with other students who do show up in the big list?
Are you sure the code to pull all the assignments is really pulling all of the assignments? Are you using the links header to get the next page or are you manually trying to generate the links? Are you processing the information as it comes in or gathering it all into one body for processing at the end?
Have you tried dumping the URIs that are called to look for a missing or out-of-sequence call? If there is one, that might indicate where the problem is and even if it's on your end or Canvases?
Is it possible that new students are getting added to the course while you're fetching the assignment list? Yes, I'm grasping, but I'm wondering if another student got inserted, if it might skip over someone. I don't program in Ruby, but I know a guy who does who said he's had issues assigning unique id's when two people hit it at the same time but on multiple threads. I think the logic just needs fixed, but I'm trying to eliminate all possibilities, no matter how far fetched they may be.
Is there a regular pattern to the ones that are missing? Does changing the ?per_page parameter affect which ones are missing?
Are you looking for Adam in the raw dump from the API calls or are you doing some processing first and then looking for him? Can you identify where Adam should fall in the progression and then see if there is something in the previous entry that is causing an issue (I honestly have no idea what that would be - again, grasping, throwing out things and hoping something sticks)?
Is Adam a super-hacker? Does he have a role other than student? Does he have a role other than student somewhere else in Canvas?
I'm afraid this isn't much help. This is one of those things where nothing obvious jumps out at me and I use the same API calls.
I have had issues with the API not returning information, but that was with the quiz submissions, not the assignment submissions. That was an issue of it only returning one submission while the documentation indicated it should return all of them. I think that's a different issue than what you're facing.
Thanks for the thoughts. I double checked my stuff and tried a few, but to no avail.
I changed the per_page to 50, and that loaded much faster overall. Good to know in general, thanks for that. But my mystery student still want included in the results.
I am printing the URI's and they appear to be in sequence. With 121 URI's, it'll be tedious to verify them all manually...
FWIW, these are quiz submissions. This student in particular has only the single submission for this quiz. I'll keep digging, and I'll update my ticket with Canvas.
I think I'm hitting a bug here with Submissions.
There are 6028 records to be returned by the API.
If I make the following calls via API, all three return 6028 records total.
I extract the Submission ID, student ID, and score from each submission record and emit that to a file.
When I sort the files and look for duplicate rows, ( sort FILE | uniq -d ), the per_page=100 file has 0 duplicates, the per_page=33 files has 66 duplicates, and the per_page=25 files has 75 duplicate rows.
For the life of me it appears Canvas is repeating one or more pages. But that's not it. The records are scattered throughout the output stream.
Does this make sense to anybody else?
The numbers you gave are interesting, because each pair adds up to 100 (ok, 33+66=99). That highly suggests a definite programming mistake somewhere. Whether it's in Canvas or in Ruby remains to be seen.
The first thing that comes to mind is a rounding error in math related to inexact representation of decimals in a binary system, there are various functions used for rounding: round, trunc, int, floor, ceil
The links are sent as page=? and per_page=?. Since you're not calculating the specific rows to return, the mistake is unlikely to be on your end.
So, let's say that page=10 and per_page=33, then it should the pages should be rows 1-33, 34-66, 67-99, so that by the time you get to 10, you have rows 298-330.
But maybe, just maybe, the computer is off slightly in the representation, so it's not getting 298, it's getting 297-329 and repeating an entry, and Adam turns out to be #330, which doesn't get called.
Normally, integer calculations aren't prone to the same type of mistakes as decimals, but who knows? You might try running a page_size of 64 and see if it returns 0 duplicates or 36? I picked 64 since it's the largest power of 2 less than the per_page limit of 100 and can be represented exactly in binary.
Anyway, if this is what's going on, then it sounds like something on Canvas' side.
I guess we could look at the source code to Canvas and see if there is anything obvious. But even if you find the error, you still have to have them fix unless you're running your own instance.
In your dump of the JSON data, can you determine the how the data is ordered when it arrives from the API call? Is it by user_id, submission_id, ???
The reason I ask is because I found a note in the PostgreSQL documentation that talks about how specifying an ORDER on the query is absolutely critical when doing pagination and that if you don't you might get different results. They say it's not a bug, it's a consequence of not using an ORDER clause on the SQL statement.
I'm trying to find where in the Canvas source code the lookups are, but I'm not a Ruby programmer and haven't really done MVC programming either, so I'm not making much progress. It might be quicker for you to find the order than it is for me to find the relevant code.
If it turns out that there is no order to the query, that might explain why it's messing up. You'd still have to have Canvas fix it. If there is a definite order, then we look somewhere else.
@glparker , I looked an API request of 36 quiz submissions from my class and was unable to discern any sort order. I looked at the user_id, the submission_id, the submitted/graded datetimes, and anything else that had unique values. So we may be onto something here. The PostgresSQL documentation said that you need to specify an ORDER clause to get consistent results with different OFFSET and LIMIT statements. I cannot guarantee that will fix the issue, but it sounds like a place for them to start looking.
I've been told that the sort order is always the order in which submissions were made. That corresponds with the submission object's "id" attribute. Grading a submission doesn't change the order they are returned in the API.
I believe was told that at one point, too, and it would seem reasonable, but the evidence speaks louder than the words of someone. Unfortunately, there are a lot of people in the world speaking about things of which they do not know.
To illustrate, I ran the list of now 33 submissions for one of the assignments in my class.
I reindexed the submission and user ids to provide anonymity so I don't trigger a FERPA violation. I saved the entire download into one array, iterated through it, saved all the submission ids and user ids, sorted each list numerically from lowest to highest, and then used the position in the list in the report below. Besides anonymizing, it provides a much easier way to see if they are in order.
The first column is the reindexed Submission ID, the second column the reindexed User ID, the third column is the Workflow State, and the last column, when present, is the Submitted At timestamp.
The order in the list below is the order they were returned via the API call.
As you can see, there is no easily discernible order.