Community

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
MBlackwell_GCA
Community Member

District Gradebook Submissions Report via API

Jump to solution

As a fully virtual school for 12,500 students across grades K-12, a part of our "compliance" grade is determining

  • how many assignments has the student been assigned per course
    • how many per assignment group per course
  • out of __ assignments, ___ were completed with a (%) value provided
  • what was their individualized (cached) due date
  • was the assignment complete
    • if so, was it completed late?

In other words, has the student been compliant in completing assignments?

The GET Submissions API call returns all the data needed using:

function getSubmissions(courseId) {
  return fetchAll(canvasDomain + `/courses/${courseId}/students/submissions?student_ids[]=all`);
}
 
And we can successfully pull one course at a time but we have about 1400 courses and when the course id is added to the array (const course = [];), the call seems to fire all at once and then times out. 
 
Other important info:
  • I am a total noob to coding and API calls. Am taking a course to learn but at the moment, I know enough on my own to be dangerous 😂
  • Using JavaScript and Visual Studio Code. Started out in Postman but could not seem to get the call to work right there. 
  • Image of sample submissions file attached
  • Working with @ChrisPonzio 
0 Kudos
1 Solution

Accepted Solutions
James
Community Champion

I'll attempt to address the original issue, although Canvas Data is another way to go. I gather all of this information (and more) nightly in about 10 minutes, but we only have 400-500 courses. I archive the data locally so I only have to fetch the new stuff each night and it goes a lot faster.

There are two or three potential issues I see.

I'm not sure what fetchAll is, but the name makes it sound like it is expecting an array rather than a single URL. fetch() is what is used for a single URL. On the other hand, it might take a single request and then watch the link headers to see what else needs fetched. It's hard to say, so this isn't my major concern; just that it's hard to know since fetchAll() isn't a built in JavaScript command. There are also some bad fetchAll() codes out there that blindly increase the page number without checking the link headers.

Each fetch() returns a promise. If you don't treat them as such, your program tends to finish before the API calls are made.

Watch out for throttling. You're limited to how hard you can hit the API at one time. There is an x-rate-limit-remaining header returned with each request and if it gets down to 0, then your requests will fail. The throttling algorithm is set up so that if you hit it with all the requests at one time, it will fail rather quickly. 

What I do is incorporate a throttling library (I use Bottleneck for JavaScript). I load node-fetch and then wrap it inside a Bottleneck limiter. Depending on the call I'm making, I may limit it to 10 concurrent API calls with a minimum between each one of 100 ms or I might go up to 30 with a minimum delay of 25 ms. Getting the submissions may be intensive, so I would use a non-aggressive rate and then be prepared to wait. Slow and steady wins the race.

When I fetch the submissions, I use a maxConcurrent=100 and minTime=50 when I am fetching only the recent submissions (using the graded_since or submitted_since parameters) and maxConcurrent=30 and minTime=250 if I'm doing a complete fetch. I strongly recommend using the _since parameters to speed up the process.

As for the promises, I start with an array, then add the fetch commands to the array and use return Promise.all() on the array. It could be a little bit more robust -- I don't do a lot of checking for failed requests, but thankfully they don't happen that often.

If you are putting 1400 requests into the array at a time, you may run into some other issues. I've had node fail with a request that makes it look like it cannot find the hostname, but other people suggest it's putting in too many requests at once. It really only seems to happen to me when I'm using my Windows machine (I have three ISPs at home and some are a little flakier than others. If I run it from my server, which is set to always use the same ISP, it seems to act better).

What I might do if I had that many is to implement two Bottleneck limiters. I would feed one the course IDs and limit it to a few at a time. There would be no fetch directly inside of those. But each of those would then call the fetchAll() routine you have to actually get the data. That way, the queue most likely to fail (the Canvas API calls) aren't getting hit too heavily and you can get all of the data for a class before moving on. Right now, my routines fetch the first submissions API call and then mix the fetch() for all of the pagination requests in with those. If something fails, it still continues to make all the calls, despite it not being able to process them.

View solution in original post

9 Replies
matthew_buckett
Community Contributor

I haven't looked closely but reporting of this sort looks to be the sort of thing that Canvas Data was designed to support. It allows you to download data about all courses/assigments/submissions in your Canvas instance and then perform analysis offline. There is a delay of 24/48 hours, but as long as you are ok with this it's probably a quicker and easier way to answer a question like this.

werner_huysman1
Community Member

Hi, I agree with @matthew_buckett that Canvas Data is probably the best way to get such reports. Regarding your question about the Javascript, it is hard to tell what the problem is without seeing the implementation of the fetchAll function. How did you handle the ansynchronous nature of the API calls in a loop?

James
Community Champion

I'll attempt to address the original issue, although Canvas Data is another way to go. I gather all of this information (and more) nightly in about 10 minutes, but we only have 400-500 courses. I archive the data locally so I only have to fetch the new stuff each night and it goes a lot faster.

There are two or three potential issues I see.

I'm not sure what fetchAll is, but the name makes it sound like it is expecting an array rather than a single URL. fetch() is what is used for a single URL. On the other hand, it might take a single request and then watch the link headers to see what else needs fetched. It's hard to say, so this isn't my major concern; just that it's hard to know since fetchAll() isn't a built in JavaScript command. There are also some bad fetchAll() codes out there that blindly increase the page number without checking the link headers.

Each fetch() returns a promise. If you don't treat them as such, your program tends to finish before the API calls are made.

Watch out for throttling. You're limited to how hard you can hit the API at one time. There is an x-rate-limit-remaining header returned with each request and if it gets down to 0, then your requests will fail. The throttling algorithm is set up so that if you hit it with all the requests at one time, it will fail rather quickly. 

What I do is incorporate a throttling library (I use Bottleneck for JavaScript). I load node-fetch and then wrap it inside a Bottleneck limiter. Depending on the call I'm making, I may limit it to 10 concurrent API calls with a minimum between each one of 100 ms or I might go up to 30 with a minimum delay of 25 ms. Getting the submissions may be intensive, so I would use a non-aggressive rate and then be prepared to wait. Slow and steady wins the race.

When I fetch the submissions, I use a maxConcurrent=100 and minTime=50 when I am fetching only the recent submissions (using the graded_since or submitted_since parameters) and maxConcurrent=30 and minTime=250 if I'm doing a complete fetch. I strongly recommend using the _since parameters to speed up the process.

As for the promises, I start with an array, then add the fetch commands to the array and use return Promise.all() on the array. It could be a little bit more robust -- I don't do a lot of checking for failed requests, but thankfully they don't happen that often.

If you are putting 1400 requests into the array at a time, you may run into some other issues. I've had node fail with a request that makes it look like it cannot find the hostname, but other people suggest it's putting in too many requests at once. It really only seems to happen to me when I'm using my Windows machine (I have three ISPs at home and some are a little flakier than others. If I run it from my server, which is set to always use the same ISP, it seems to act better).

What I might do if I had that many is to implement two Bottleneck limiters. I would feed one the course IDs and limit it to a few at a time. There would be no fetch directly inside of those. But each of those would then call the fetchAll() routine you have to actually get the data. That way, the queue most likely to fail (the Canvas API calls) aren't getting hit too heavily and you can get all of the data for a class before moving on. Right now, my routines fetch the first submissions API call and then mix the fetch() for all of the pagination requests in with those. If something fails, it still continues to make all the calls, despite it not being able to process them.

MBlackwell_GCA
Community Member

The fetchAll is a fetch function to handle the pagination based on headers. We did finally come across the Bottleneck package. Will have to test your Concurrent and minTime values. We have tested it a little trying to figure out what our limitations are without data loss. We also found better success using .map instead of a for loop for the courses. We finally also found that our greatest error was not writing in a catch(err) like the novices we are. We are also learning the limitations that a json file can hold. Seems like it will only hold 12 million lines of data and these submissions calls fill that up easily.
Would you prefer the gradebook feed or the gradebook submissions call?

@James Super interested to learn more about " archive the data locally so I only have to fetch the new stuff each night and it goes a lot faster." Since we do plan to run regularly for 700-800 courses. Script examples attached

getSubmissions.js.png

fetchAll.png

submissions.js.png

example results.png

 

@MBlackwell_GCA 

What I do is write the results to a database rather than a JSON file. Then, before I run the script to fetch the data, I query the database to see when the last time it ran was. I then back the time up to before the script stated running. I could get the most recent entry for each course rather than one for the entire script.

I run the API call twice. Once using submitted_since and again using graded_since. This avoids having to download all of the grades from early in the semester that haven't changed in 4 months to get the 3 submisssions for the day.

I'm using MySQL because that's what I'm familiar with and it fits my budget, but other databases would work. I use a REPLACE statement so that if the record already exists it gets overwritten and it is inserted if it is missing. The schema for the submissions has all of the fields for the object you showed. But I also have objects for assignments, assignment_groups, courses, enrollments, users, and some other things needed to have it all make sense.

All of my processing is done from the database, rather than the results as they come in from Canvas.

You might be able to fetch things faster using GraphQL, but I wouldn't use the Grade Audit Log as it contains a different set of data that doesn't contain all of the information that you need.

WOW! Talk about timing. Seems I will be learning about GraphQL in my course this week. 😂 Thank you for your responses, @James!

@matthew_buckett To be honest, I have not spent nearly enough time exploring the Canvas Data resource. The preference at my district is for real-time data and the 48 hour delay leaves much to be desired. Not that is isn't still good data! I remember back in the day when we were happy to have week-old data! I still need to explore this resource more and appreciate you sharing the link.

Hi @werner_huysman1 

I hope the images shared in the response below help!