Increaseing TPS for Canvas API requests

Jump to solution
26-05-ins-111
Community Novice

Hello Canvas Developers,

We have conducted a performance test with syncing grades to canvas (we are using a self-hosted canvas instance image in AWS). Our goal was to determine the infrastructure and TPS required on our system to sync expected grades within the expected time duration. After completing this task we have some clarifications and really appreciate all of your feedback for these,

  • To make sure that we are not abusing Canvas as a system, we do throttle before canvas does. Currently in production, we have set the TPS to 40 (This TPS is for all the users/instructors in our system) and while doing the performance test in self-hosted canvas instance we observed that this should be increased at least to 64 while retaining the same number of concurrent API requests. How do we confirm that setting the same in production will not create any issues at "instructure.com"?
  • Currently in production we do make maximum of 16 concurrent API requests (Assignment Create/Update and Grade submission for all users in the system). In order to cater our performance requirement, we would need to increase this by 3 times (i.e. the maximum of 48 concurrent API requests). How should we confirm that this would not be an issue at "instructure.com"?

References we followed,

http://canvas.instructure.com/doc/api/file.throttling.html
https://community.canvaslms.com/docs/DOC-8381-api-rate-limiting

1 Solution

Don't hard-code the limits. In the document you linked to, Throttling - Canvas LMS REST API Documentation, it refers to the X-Rate-Limit-Remaining header that is returned. At the 2017 InstructureCon, one of the Canvas engineers explained that it's not really a slow-down, it's more of a stops-working once that gets down to 0.

The best practice is not to blindly make 1024 API calls per second, but to monitor that number and make sure it doesn't get to 0. The engineer explained that in normal circumstances, people don't really get anywhere close that. 1024 connections per second from the same user may well reach that threshold.

And if you're finding that 16 threads at 64 TPS is the limit of your testing server where you're the only user, then I am almost certain it would not work on a hosted instance unless you coordinate with Canvas and let them know your needs so they can scale accordingly.

The limits from your testing cannot be hard coded on a production site because now you're in a shared environment with other institutions and students are actually using the system. Server load goes up and down, which means that the cost may vary as well and while 16 threads of 64 transactions per second might be okay at one moment, it may not be at another time.

I would consider making sure you're using the most efficient API calls. Instead of making one call for each student for each assignment, make one call for the entire assignment with all of the students. It will take longer to process each call and have a higher cost, but it will reduce the number of connections and save time overall.

A question is why you think that a grade sync has to be accomplished so quickly? I would think those would normally be background processes that you run from a server in the middle of the night, not something that has to happen in quickly while people are using the system. But I don't do grade sync, either, and if this is a button that faculty push and you want the grade sync to happen immediately, then it might be necessary. Even then, when I'm doing a CSV import for the gradebook, where I upload all of the grades and let Canvas process in the background, it's not immediate -- even when I have just a few grades. Maybe the grade sync should trigger a process that runs on a server and handles it as a background process rather than doing it as they wait.

Another possibility to consider is using multiple users / access tokens to distribute the load and thus be able to increase the amount of data sent through without hitting the limit for each one. That may not work as well as imagined, though. If you are pounding the server on one user, then the load on the server goes up and it slows down things for other people, which means that the limit costs for the additional users sending the information would be higher and you wouldn't be able to get as much done.

View solution in original post