cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Community Member

API Rate Limiting

I need an recommendation / solution.

 

When making Canvas API calls for one of our University Partner, if the University has 20K students (or more), we make 20K API calls to get each student’s LMS record.  Randomly, but often, we get “Unauthorized” and “Forbidden” response messages from the API calls.  And based on my research, it is because of API Rate Limiting that Canvas enforces (https://community.canvaslms.com/docs/DOC-8381-api-rate-limiting).

 

What is the best way to call the APIs if I want to get the LMS record for every student, for a given partner, in our database?

The end-game is to be able to successfully make a Canvas API call for every student in our database, for a given partner, in one thread.

 

I hope this is clear.

Would love to hear your input and recommendation.

0 Kudos
9 Replies
Highlighted
Navigator

kenneth.robinson@academicpartnerships.com 

The safest way is to make one request at a time, sequentially. Canvas has said that you should not exceed your limit doing that.

It is unreasonable with 20k students to do that.

You can pay attention to the x-rate-limit-remaining and the x-request-cost headers. In theory, this is more difficult that it seems, perhaps depending on the library you're using to make the API calls. You can make concurrent calls, but the 50 penalty at the beginning limits how many can be made at a time.

The best solution I have found is to stagger the requests so that Canvas has time to calculate the costs and not apply that 50 penalty all at once. The delay doesn't have to be great, 25-50 ms depending on the call that you're making, but making it longer is safer and less likely to get an error. I say that because some calls are more expensive than others. I also request per_page=50 in most cases since it's quicker to get 50 than 100.

  • For enrollment data, I allow up to 50 simultaneous requests but stagger them by 100 ms.
  • For the terms, I allow 30 simultaneous requests but stagger them by 100 ms.
  • For the courses, I allow 30 simultaneous requests but stagger them by 50 ms.
  • For the assignments, I allow 40 simultaneous requests but stagger them by 100 ms.
  • For submissions, I allow 30 at a time with a delay of 250 ms. This is because I'm fetching submission_history and that can be really large.

In most cases, the delay is the limiting factor, not the number of concurrent requests allowed. If I delay each request by 50 ms, then I can only make 20 per second, so the concurrent limit would only come into play if they take longer than a second to complete.

Along the way, I allow each type of request to empty the queue before starting the next type. That allows me to get back up to the 700 limit for each new type. In another program I wrote recently, I started downloading the user list (only 20 at a time, I think) and then making calls off of it before I finished downloading the entire user list.

The code I described took 230 seconds to make 919 requests last night. We're slow because it's summer. In a regular term, it might take 13 minutes to run. I haven't gotten any error messages with timeouts and I've been running this nightly for about 1.5 years now.

We are also a much smaller school than you are wanting to handle, but I don't fetch individual user information as I can get what I need as part of another call. You may also be able to incorporate the graphQL interface that allows you to get select information from multiple tables in one call rather than having to make calls to each of those APIs.  You will still have to mess with pagination, though.

This question isn't really helpful as you may think it is.

What is the best way to call the APIs if I want to get the LMS record for every student, ...

What do you mean by "LMS record for every student" ?

There is user information, there is enrollment data, there is submission data, there are analytics, there are ...

Knowing what you're trying to fetch can help figure out the best way to get it.

Another data source you may be able to use is Canvas Live Events. Keep data on your end and let Canvas let you know when it changes. Then you don't have to download the complete set of data every time.

0 Kudos
Highlighted

Thanks for the reply. Very helpful information.

W are getting user, terms, course, enrollment and submission data as well.

Getting the user data is where the issue seems to be.

From you reply, I get the following:

1. No threading within a University Partner / Token

2. 20K + requests is going to take some time.

3. We have some University Partners that have student populations of 25K+. These requests need to be sequential and staggered/throttled.

Question:

1. Is their an API to get multiple users records at one time? An API that can take a list of user/student IDs and get back as a response a list of user objects? I know you mentioned graphQL to do something similar

Thanks,

Ken

0 Kudos
Highlighted

I don't think your first take-away was what I was saying. Canvas has said that if you're making one call at a time, you won't hit the limit and it will basically stay at 700 (it refills faster than you can use it in a single threaded environment). That does not mean that you have to make calls one at a time, just that if you're only making one at a time then you don't have to worry about the rate limiting.

There are several API calls that will get multiple user records at one time. The one I use is the List enrollments endpoint. I get a list of courses and then fetch the enrollments for those courses. That works for me because I only need the students that are actually enrolled in courses for the purpose I'm gathering data.

I fetch the items in this order.

  1. terms so that I only process courses from current or upcoming terms.
  2. courses
  3. enrollments for each course. This gives me the section data, the user information, the grade in course, the total time in the course, and the last activity in the course.
  4. assignment_groups for each course (this gives all graded assignments with one call rather than multiple ones necessary with the assignments call).  I include[] assignments, but exclude_response_fields[] of rubrics and description and set the override_assignment_date to false (I don't need to mess with assignment overrides).
  5. submissions for each course using the multiple submissions endpoint. I use student_ids[]=all and a bunch of other parameters.
  6. student_summaries for each course using the analytics api

I guess only items 1-2 need to be first. The others could happen in any order. You'll notice that I get a lot information from the enrollments api.

The Canvas implementation of GraphQL is not as full-featured as some. Notably filtering isn't widely supported, so you cannot say "give me all students who have spent 0 time in the course."  I don't use graphQL much because what I need is available through the REST API. However, as Canvas moves forward, more development is taking place through GraphQL and not necessarily being backported to the REST API.

There is one API call that will give you a list of users if you don't want to garner it from enrollments or some other API call. That's the list users in account endpoint. It does not allow you to give it a list of user IDs (the search is limited to 1 item). It does not allow you tack on extra information that you would like to have -- that's why I don't use it for my data warehouse. I do use it in my nightly processing of SIS data to make sure that no records fell through the cracks.

0 Kudos
Highlighted

James,

Understood.

Thanks.

I think I need to review the enrollment API and see if I can reduce the number of calls I am making by getting user data with the enrollment data; instead of making user API Calls separately.

Bottom, line…

1. For a University Partner, we are needing to retrieve from Canvas the LastLoginTime for each user we have in our database

  • We are needing to know if a user has logged in to their online degree plan. Simply validating if they are active in their online degree program.

2. The reason I call the User API now is we don’t have the user id that is used to get the enrollments, courses, terms, etc.

  • We have our User ID.

  • That user is on the user record in Canvas; and we get it when we do a API call with our user ID.

  • Then we take the Canvas User ID and do the other lookups.

Ken

0 Kudos
Highlighted

James,

Updated.

Understood.

Thanks.

I think I need to review the enrollment API and see if I can reduce the number of calls I am making by getting user data with the enrollment data; instead of making user API Calls separately.

Bottom, line…

1. For a University Partner, we are needing to retrieve from Canvas the LastLoginTime for each user we have in our database

  • We are needing to know if a user has logged in to their online degree plan. Simply validating if they are active in their online degree program.

2. The reason I call the User API now is we don’t have the user id that is used to get the enrollments, courses, terms, etc.

  • We have our User ID.

  • That user is on the user record in Canvas; and we get it when we do a API call with our user ID.

  • Then we take the Canvas User ID and do the other lookups.

Lastly, when calling the User API, calls will be going fine, then all of sudden, I will start getting an Unauthorized response. Is there a limit on the life on the token being used. I am using a token provided by the University Partner.

Ken

0 Kudos
Highlighted

If you are using OAuth to get a token, then the default time is 1 hour and you will need to check a header to see if the forbidden is due to an expired token or a lack of permissions.

The OAuth2 page in the API documentation provides more details.

0 Kudos
Highlighted

Thanks.

I have the “Forbidden” responses taken care of.

What you suggested is dead on.

I get an “Unauthorized” response however.

After several calls have been working.

Same token. Same url; with the only difference being the user ID.

I can take the same URL (I get it while debugging), and try it in a REST client, like postman, and it works fine.

0 Kudos
Highlighted

I was using forbidden loosely. Unauthorized is the actual status code. See the section on "Storing Tokens" in the link I gave.

I don't use Postman, maybe it updates the OAuth for you????

0 Kudos
Highlighted

If the question really is:

For a University Partner, we are needing to retrieve from Canvas the LastLoginTime for each user we have in our database

 

  • We are needing to know if a user has logged in to their online degree plan. Simply validating if they are active in their online degree program.

Why not get the university to export data from the Canvas Data Export. The should be able to give you a filtered set of the user_ids that have accessed a given course in the last xxx hours. Or even just a list of the LastLoginTime per user_id.

 

Tags (1)