@nardell ,
I am downloading copies of certain data through the API each night for a local copy and our early alert software. It runs at 11 pm Central Time. I download the terms, the courses with enrollments, courses with assignments, courses with submissions, the deleted courses, the enrollments, assignment groups, submissions, and analytics. We're a small school, less than 1500 FTE I think, and make about 2000 API calls each night. With concurrency I get about 10-11 API calls per minute on average. It took 181 seconds last night to make 2012 calls.
I use Node JS with the bottleneck library to limit my API requests.
With bottleneck, I limit the number of concurrent requests and introduce a minimum delay between calls. The concurrency ranges from 30 to 50 with minimum delays (when needed) of between 50 and 100 ms. I discovered that it's better to ramp up to the full concurrency rather than starting off hitting it all at once.
I also vary the per_page from 50 to 100. I do this so that I can predict the pagination where possible. If the API returns the links in a way that allows me to determine the last page, I can then make all of the API calls at once. Rather than waiting for it load 100, which is slower and affects the limit remaining, I grab a smaller number and then make a number in parallel.
When you're making 50 calls at a time, I can't pause one because the rate limiting when it gets too low as the calls have already been made. So I stop with every new API type and let it finish, essentially resetting the limit to 700.
I do some monitoring of the limiting during the calls, subject to the concurrency and time delays mentioned above. When the x-rate-limit-remaining drops below 300, I take how far below the 300 the limit is by finding 300/x-rate-limit-remaining and multiplying that by 500 times the x-request-cost value. If the remaining limit is 250 and the cost was 2.7, then I would come up with 300/250*500*2.7 = 1620.
I then sleep for that many microseconds before continuing. With the limits I've put into place, it normally doesn't get that far. Last night's run only got down to 462 for the x-rate-limit-remaining. That was during some calls to get the assignment groups. The highest any x-request-cost I saw was 6.936426743000082, also during the fetching of the assignment groups.
That suggests that I could speed things up and make additional calls, but 3 minutes isn't that bad and it runs in the middle of the night, so it's not imperative that I push it to the breaking point.
I could attempt to dynamically scale the pool size, but it wasn't going to be easy because by the time you figure out that you're running close to exceeding it, the calls have already been made. I went with hard-coding the limits instead.