The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December. Read our blog post for more info about this change.
Found this content helpful? Log in or sign up to leave a like!
I wanted to ask a question about best ways to play nice when running concurrent API calls. I am aware of the Canvas API's Rate Throttling Model and am looking at the most effective way to scale API calls to avoid exceeding quotas. I am using a resource pool of open HTTP connections (currently only using GET operations). However I make no attempt to introduce a delay between successive HTTP calls. When my resource pool is fully loaded, all of the HTTP calls will be made in very rapid succession (then, as the process runs there is a natural delay between calls, introduced by different return times, controlling when the pool is refreshed). The resource pool is keeping me from exceeding the quota, but I still need to look at smart ways of dynamically scaling the size of this pool to account for variability on a production system. As I prepare to move from testing on a non-production instance of Canvas to production, considering if my approach is production ready (just because I can make a bunch of http calls rapidly, does not mean I should..) . Also wondering if I should avoid running a concurrent process during peak use hours - is any process that comes close to hitting the rate throttle limit going to noticeably degrade system performance for other users. Would be interested in others input on these considerations.
Mike
According to this document, the API rate limits are set based on an individual token, not on an institution or account. So, if you have several people using an app, each logged in via OAuth (or with their own generated token), each token is given the same quota.
The document goes into more detail, but each request is returned with an X-Rate-Limit-Remaining header you can check as part of your program logic to scale up/down accordingly.
From my own experience, I'm using a Flask app to poll assignment and outcome scores and then posting information back, sometimes for large classes, and I have not run into rate limiting issues. I use a Python Pool to run several concurrent processes.
Thanks Brian - my work with running API calls concurrently has been promising. Have done mostly used Python for my API work, but for the concurrent stuff have been experimenting with Clojure (for its support for Communicating Sequential Processing style programming). In any case, thanks for the advice!
@nardell ,
I am downloading copies of certain data through the API each night for a local copy and our early alert software. It runs at 11 pm Central Time. I download the terms, the courses with enrollments, courses with assignments, courses with submissions, the deleted courses, the enrollments, assignment groups, submissions, and analytics. We're a small school, less than 1500 FTE I think, and make about 2000 API calls each night. With concurrency I get about 10-11 API calls per minute on average. It took 181 seconds last night to make 2012 calls.
I use Node JS with the bottleneck library to limit my API requests.
With bottleneck, I limit the number of concurrent requests and introduce a minimum delay between calls. The concurrency ranges from 30 to 50 with minimum delays (when needed) of between 50 and 100 ms. I discovered that it's better to ramp up to the full concurrency rather than starting off hitting it all at once.
I also vary the per_page from 50 to 100. I do this so that I can predict the pagination where possible. If the API returns the links in a way that allows me to determine the last page, I can then make all of the API calls at once. Rather than waiting for it load 100, which is slower and affects the limit remaining, I grab a smaller number and then make a number in parallel.
When you're making 50 calls at a time, I can't pause one because the rate limiting when it gets too low as the calls have already been made. So I stop with every new API type and let it finish, essentially resetting the limit to 700.
I do some monitoring of the limiting during the calls, subject to the concurrency and time delays mentioned above. When the x-rate-limit-remaining drops below 300, I take how far below the 300 the limit is by finding 300/x-rate-limit-remaining and multiplying that by 500 times the x-request-cost value. If the remaining limit is 250 and the cost was 2.7, then I would come up with 300/250*500*2.7 = 1620.
I then sleep for that many microseconds before continuing. With the limits I've put into place, it normally doesn't get that far. Last night's run only got down to 462 for the x-rate-limit-remaining. That was during some calls to get the assignment groups. The highest any x-request-cost I saw was 6.936426743000082, also during the fetching of the assignment groups.
That suggests that I could speed things up and make additional calls, but 3 minutes isn't that bad and it runs in the middle of the night, so it's not imperative that I push it to the breaking point.
I could attempt to dynamically scale the pool size, but it wasn't going to be easy because by the time you figure out that you're running close to exceeding it, the calls have already been made. I went with hard-coding the limits instead.
@James Thank You You gave me a lot to work with here! I will look at introducing the "tunable" delay between successive API calls. If I understand correctly, the delay helps mitigate the "pre-flight" penalty applied to concurrent API calls hitting the server (when the calls hit the server as an immediate block). I also like the idea of a gradual ramp-up in either / both rate of calling and pool size.
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in