cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
26-05-ins-111
Community Participant

Increaseing TPS for Canvas API requests

Jump to solution

Hello Canvas Developers,

We have conducted a performance test with syncing grades to canvas (we are using a self-hosted canvas instance image in AWS). Our goal was to determine the infrastructure and TPS required on our system to sync expected grades within the expected time duration. After completing this task we have some clarifications and really appreciate all of your feedback for these,

  • To make sure that we are not abusing Canvas as a system, we do throttle before canvas does. Currently in production, we have set the TPS to 40 (This TPS is for all the users/instructors in our system) and while doing the performance test in self-hosted canvas instance we observed that this should be increased at least to 64 while retaining the same number of concurrent API requests. How do we confirm that setting the same in production will not create any issues at "instructure.com"?
  • Currently in production we do make maximum of 16 concurrent API requests (Assignment Create/Update and Grade submission for all users in the system). In order to cater our performance requirement, we would need to increase this by 3 times (i.e. the maximum of 48 concurrent API requests). How should we confirm that this would not be an issue at "instructure.com"?

References we followed,

http://canvas.instructure.com/doc/api/file.throttling.html
https://community.canvaslms.com/docs/DOC-8381-api-rate-limiting

1 Solution

Accepted Solutions
James
Community Champion

Don't hard-code the limits. In the document you linked to, Throttling - Canvas LMS REST API Documentation, it refers to the X-Rate-Limit-Remaining header that is returned. At the 2017 InstructureCon, one of the Canvas engineers explained that it's not really a slow-down, it's more of a stops-working once that gets down to 0.

The best practice is not to blindly make 1024 API calls per second, but to monitor that number and make sure it doesn't get to 0. The engineer explained that in normal circumstances, people don't really get anywhere close that. 1024 connections per second from the same user may well reach that threshold.

And if you're finding that 16 threads at 64 TPS is the limit of your testing server where you're the only user, then I am almost certain it would not work on a hosted instance unless you coordinate with Canvas and let them know your needs so they can scale accordingly.

The limits from your testing cannot be hard coded on a production site because now you're in a shared environment with other institutions and students are actually using the system. Server load goes up and down, which means that the cost may vary as well and while 16 threads of 64 transactions per second might be okay at one moment, it may not be at another time.

I would consider making sure you're using the most efficient API calls. Instead of making one call for each student for each assignment, make one call for the entire assignment with all of the students. It will take longer to process each call and have a higher cost, but it will reduce the number of connections and save time overall.

A question is why you think that a grade sync has to be accomplished so quickly? I would think those would normally be background processes that you run from a server in the middle of the night, not something that has to happen in quickly while people are using the system. But I don't do grade sync, either, and if this is a button that faculty push and you want the grade sync to happen immediately, then it might be necessary. Even then, when I'm doing a CSV import for the gradebook, where I upload all of the grades and let Canvas process in the background, it's not immediate -- even when I have just a few grades. Maybe the grade sync should trigger a process that runs on a server and handles it as a background process rather than doing it as they wait.

Another possibility to consider is using multiple users / access tokens to distribute the load and thus be able to increase the amount of data sent through without hitting the limit for each one. That may not work as well as imagined, though. If you are pounding the server on one user, then the load on the server goes up and it slows down things for other people, which means that the limit costs for the additional users sending the information would be higher and you wouldn't be able to get as much done.

View solution in original post

8 Replies
James
Community Champion

Excuse my ignorance, but if you are self-hosting, then how would it create an issue at "instructure.com"?

stuart_ryan
Community Coach
Community Coach

I echo James' question, are you using this as a test bed before you move to a Canvas Hosted instance? If not and you are planning on self-hosting within AWS, then what you do on your own instance will not have any impact on any instructure.com instances.

Also, I am curious, when you refer to throttling, is that something you are doing at the transaction sender, or on Canvas Itself. Not having installed the self-hosted version completely yet, I would like to understand where you are setting (or finding) each of these limits. That should help us better understand what you are attempting and give you any relevant advice.

Cheers,
Stuart

James
Community Champion

Without knowing how you have things setup, I cannot be sure, but you may not be using the most efficient API calls to do the grade sync. You can retrieve and submit more than one grade in a single API call and by leveraging that, you may be able to reduce the number of calls that you make.

Also, if you're running your own instance, you should be able to access the database directly and see which grades need synced and cut down on the number of API calls needed to do that.

26-05-ins-111
Community Participant

We are using this self-hosted instance in AWS for testing purposes only. Once tested, same would be applying to the "instructure.com".

Let me clear out what we were doing. As Stuart Ryan‌ mentioned we are having a transaction sender which will call Canvas APIs to submit grades. We have implemented throttling on this transaction sender and conducted a performance test as below,

  • Maximum of 16 concurrent API calls are going out from transaction sender to Canvas at a maximum of 64 TPS.

Please note that we used self-hosted Canvas instance in AWS to conduct this performance test.

Is it safe to enable such load from our transaction sender to actual "instructure.com"? How can we get a confirmation before we send such load?

Please do let me know if I didn't make it clear.

Thanks in advance,

Thilina

James
Community Champion

Don't hard-code the limits. In the document you linked to, Throttling - Canvas LMS REST API Documentation, it refers to the X-Rate-Limit-Remaining header that is returned. At the 2017 InstructureCon, one of the Canvas engineers explained that it's not really a slow-down, it's more of a stops-working once that gets down to 0.

The best practice is not to blindly make 1024 API calls per second, but to monitor that number and make sure it doesn't get to 0. The engineer explained that in normal circumstances, people don't really get anywhere close that. 1024 connections per second from the same user may well reach that threshold.

And if you're finding that 16 threads at 64 TPS is the limit of your testing server where you're the only user, then I am almost certain it would not work on a hosted instance unless you coordinate with Canvas and let them know your needs so they can scale accordingly.

The limits from your testing cannot be hard coded on a production site because now you're in a shared environment with other institutions and students are actually using the system. Server load goes up and down, which means that the cost may vary as well and while 16 threads of 64 transactions per second might be okay at one moment, it may not be at another time.

I would consider making sure you're using the most efficient API calls. Instead of making one call for each student for each assignment, make one call for the entire assignment with all of the students. It will take longer to process each call and have a higher cost, but it will reduce the number of connections and save time overall.

A question is why you think that a grade sync has to be accomplished so quickly? I would think those would normally be background processes that you run from a server in the middle of the night, not something that has to happen in quickly while people are using the system. But I don't do grade sync, either, and if this is a button that faculty push and you want the grade sync to happen immediately, then it might be necessary. Even then, when I'm doing a CSV import for the gradebook, where I upload all of the grades and let Canvas process in the background, it's not immediate -- even when I have just a few grades. Maybe the grade sync should trigger a process that runs on a server and handles it as a background process rather than doing it as they wait.

Another possibility to consider is using multiple users / access tokens to distribute the load and thus be able to increase the amount of data sent through without hitting the limit for each one. That may not work as well as imagined, though. If you are pounding the server on one user, then the load on the server goes up and it slows down things for other people, which means that the limit costs for the additional users sending the information would be higher and you wouldn't be able to get as much done.

View solution in original post

26-05-ins-111
Community Participant

James Jones‌ Thank you very much for the detailed explanation and those were very useful for us to determine our next steps.

As an LTI launcher and grade synchronizer from Tool Providers to Tool Consumers, our system should have a generic throttling mechanism for all LMSs (i.e. D2L, Moodle, Canvas, etc.). So the best option would be to coordinate with Canvas to accommodate our request.

I hope that you were referring to this API for submitting grades as a bulk to reduce the number of API calls. We will consider this approach in our future implementations.

James
Community Champion

That is the API call I was talking about.