cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
nardell
Community Participant

Bookmarking for Enrollments Index API in upcoming release

Jump to solution

This question may be covered in the release notes and missed it. One of the pending changes is the "Bookmarking for Enrollments Index API". It seems that this change will be applied production Canvas instances on June 17.  I believe understand the affect of the change - the Link section in the Response Header will no longer provide a range of pages, as in:

<https://your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=1&per_page=100>; rel="current",<https://your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=2&per_page=100>; rel="next",<https://your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=1&per_page=100>; rel="first",<https://your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=3&per_page=100>; rel="last"

Instead links will be returned as:

<https:/your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=first&per_page=100>; rel="current",<https://your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=bookmark:WyJTdHVkZW50RW5yb2xsbWVudCIsIkxhc3RuYW1lLCBGaXJzdG5hbWUgIiwxMjM0NTY3XQ==&per_page=100>; rel="next",<https://your.institution.instructure.com/api/v1/courses/:course_id/enrollments?page=first&per_page=100>; rel="first"

..henceforth it will be necessary to recursively follow links in any API code that uses the the Enrollments endpoint. Just want to make sure I have the story correct. It seems since this would be a potentially breaking change for an institution's integrations (if the integration relied upon explicit page numbers in its page-following scheme) and Instructure is providing a period of time for the institution to accommodate this change, which is soon coming to an end. I am curious if anyone knows if Instructure is moving away from providing page ranges in REST responses. Though I do not depend on the page numbers in production code in use,  I appreciated that it allows for parallel execution of API requests. 

Thanks,

Michael Nardell

Tags (1)
1 Solution

Accepted Solutions
James
Community Champion

 @nardell  

If you look at the code used for the bookmarking, it generates SQL code to use as a starting point the next time you make the call based on where it left off with the current data. If I my key for sorting was id and I left off with id=1234, then the bookmark for the next page would translate into a WHERE id > 1234. It's more complicated than that, but that appears to be the gist of it.

For the enrollments API, it uses enrollment.type, user.sortable_name, and enrollment.id as the key. In this case, the information returned after a bookmark could change. If I leave off with a StudentEnrollment for "Smith, John" who has an enrollment_id of 1234, then it generates code something like this:

WHERE 
(type > "StudentEnrollment") OR
(type = "StudentEnrollment" AND sortable_name > "Smith, John") OR
(type = "StudentEnrollment" AND sortable_name = "Smith, John" AND id > 1234)
‍‍‍‍‍‍‍‍‍

Looking at that, there isn't anything immutable about it. Between the time I made the first call and the time I made the second call, I might have added "Doe, Jane" as a StudentEnrollment and I would never get her out of the call because I had already passed that point. Likewise, I might delete the enrollment for "Thomas, Jordan" after I had fetched it and so when I went back to make that call again, I would get different data than I did the first time.

In the case of Jane, she is missing until the next time you fetch the enrollments. Jordan remains there even though he's been deleted.

This could have happened without using bookmarks as well. If John was record number 20 (page=2&per_page=10) and Jane gets added, then John gets duplicated since he's now record number 21. Jane is still missing, though. You wouldn't get the duplication with the bookmark approach, so at least it fixes that issue.

In reality, no one is likely to be be jumping straight to page 3 as their first request for a list of enrollments. Similarly, you shouldn't use a bookmarked page a month after it was generated (if they're around that long)

The history on the patch to use bookmarks says it's to fix the sort. Fixing the sorting wasn't what I remembered from the announcement, though. It was was to improve the performance (or something like that) of the servers. This very probably could have been because too many people were hitting the API with the parallel requests. The enrollments API is more costly than some to generate. When I was updating my code to download the access report for every student in a class, it was the list of the users with enrollments that was the expensive call. The getting of the access report data wasn't costly at all. That might be why they chose to do this to enrollments.

For a long time, I've checked my links to look for the presence of a last link header and if it contains a numeric page parameter (I've got some code I probably need to double check and probably update). If the page was present, then I would take advantage of the parallelism. If not, then I would make the call sequentially.

Some people have commented about how they just put in a sequence of numbers and rely on getting an error when it's exhausted the data and they don't look at the link headers at all. That's not a good way to do this and they will definitely get bitten if they're using this endpoint but there's a good chance that Canvas will do this to other endpoints in the future.


Right now, it doesn't appear to be a wholesale change. The code to switch it over to a collection and bookmarks wasn't trivial.

For what it's worth (I didn't know this until I was preparing this response), a bookmark is a Base64URL encoding of the stringified JSON of the data structure that contains the key information from where the request left off. You can take it and paste it into an online Base64URL decoder (such as the one at Base64Decode.org) and find out what information is contained in it. It is not a bookmark saved to a database somewhere that is looked up when received to find out what the information is.

If someone wanted to, they could say "Give me all of the enrollments after 'Smith, John'" without knowing what page that happened with. That's something you cannot do with the numbered pages. Any extra padded equal signs at the end can be ignored. In my limited testing, you do have to put something in for the enrollment_id, but it doesn't have to be correct.

I put in ["StudentEnrollment","Smith, John",0] and got the Base64URL encoded value of WyJTdHVkZW50RW5yb2xsbWVudCIsIlNtaXRoLCBKb2huIiwwXQ==. When I tried page=bookmark:WyJTdHVkZW50RW5yb2xsbWVudCIsIlNtaXRoLCBKb2huIiwwXQ, I picked up with the first person after John. 

I don't know how useful that is, but it explains a little mystery I had about just what a bookmark was.

Strangely, when I decode the bookmark you provided, it has "StuUentEnrollment" instead of "StudentEnrollment".

View solution in original post

3 Replies
nardell
Community Participant

There is an interesting question lurking in the background - what promises of immutability are made by the data returned by bookmarked indexes? In the case of the Gradebook History and Grade Change Log APIs (both return bookmarked links) we can expect that the underlying data to remain immutable.  Enrollment data does change and I have verified that a bookmarked URL will return different data when there are changes to class Enrollments. (the bookmark does not name a value but is a reference to a parcel of mutable state). 

James
Community Champion

 @nardell  

If you look at the code used for the bookmarking, it generates SQL code to use as a starting point the next time you make the call based on where it left off with the current data. If I my key for sorting was id and I left off with id=1234, then the bookmark for the next page would translate into a WHERE id > 1234. It's more complicated than that, but that appears to be the gist of it.

For the enrollments API, it uses enrollment.type, user.sortable_name, and enrollment.id as the key. In this case, the information returned after a bookmark could change. If I leave off with a StudentEnrollment for "Smith, John" who has an enrollment_id of 1234, then it generates code something like this:

WHERE 
(type > "StudentEnrollment") OR
(type = "StudentEnrollment" AND sortable_name > "Smith, John") OR
(type = "StudentEnrollment" AND sortable_name = "Smith, John" AND id > 1234)
‍‍‍‍‍‍‍‍‍

Looking at that, there isn't anything immutable about it. Between the time I made the first call and the time I made the second call, I might have added "Doe, Jane" as a StudentEnrollment and I would never get her out of the call because I had already passed that point. Likewise, I might delete the enrollment for "Thomas, Jordan" after I had fetched it and so when I went back to make that call again, I would get different data than I did the first time.

In the case of Jane, she is missing until the next time you fetch the enrollments. Jordan remains there even though he's been deleted.

This could have happened without using bookmarks as well. If John was record number 20 (page=2&per_page=10) and Jane gets added, then John gets duplicated since he's now record number 21. Jane is still missing, though. You wouldn't get the duplication with the bookmark approach, so at least it fixes that issue.

In reality, no one is likely to be be jumping straight to page 3 as their first request for a list of enrollments. Similarly, you shouldn't use a bookmarked page a month after it was generated (if they're around that long)

The history on the patch to use bookmarks says it's to fix the sort. Fixing the sorting wasn't what I remembered from the announcement, though. It was was to improve the performance (or something like that) of the servers. This very probably could have been because too many people were hitting the API with the parallel requests. The enrollments API is more costly than some to generate. When I was updating my code to download the access report for every student in a class, it was the list of the users with enrollments that was the expensive call. The getting of the access report data wasn't costly at all. That might be why they chose to do this to enrollments.

For a long time, I've checked my links to look for the presence of a last link header and if it contains a numeric page parameter (I've got some code I probably need to double check and probably update). If the page was present, then I would take advantage of the parallelism. If not, then I would make the call sequentially.

Some people have commented about how they just put in a sequence of numbers and rely on getting an error when it's exhausted the data and they don't look at the link headers at all. That's not a good way to do this and they will definitely get bitten if they're using this endpoint but there's a good chance that Canvas will do this to other endpoints in the future.


Right now, it doesn't appear to be a wholesale change. The code to switch it over to a collection and bookmarks wasn't trivial.

For what it's worth (I didn't know this until I was preparing this response), a bookmark is a Base64URL encoding of the stringified JSON of the data structure that contains the key information from where the request left off. You can take it and paste it into an online Base64URL decoder (such as the one at Base64Decode.org) and find out what information is contained in it. It is not a bookmark saved to a database somewhere that is looked up when received to find out what the information is.

If someone wanted to, they could say "Give me all of the enrollments after 'Smith, John'" without knowing what page that happened with. That's something you cannot do with the numbered pages. Any extra padded equal signs at the end can be ignored. In my limited testing, you do have to put something in for the enrollment_id, but it doesn't have to be correct.

I put in ["StudentEnrollment","Smith, John",0] and got the Base64URL encoded value of WyJTdHVkZW50RW5yb2xsbWVudCIsIlNtaXRoLCBKb2huIiwwXQ==. When I tried page=bookmark:WyJTdHVkZW50RW5yb2xsbWVudCIsIlNtaXRoLCBKb2huIiwwXQ, I picked up with the first person after John. 

I don't know how useful that is, but it explains a little mystery I had about just what a bookmark was.

Strangely, when I decode the bookmark you provided, it has "StuUentEnrollment" instead of "StudentEnrollment".

View solution in original post

nardell
Community Participant

Thanks James Jones

This is incredibly helpful - I suppose I wanted to read some deep meaning into the notion of bookmarks, wanting them to be snapshots of enrollments that existed in a particular moment in time.  But the you have demonstrated that is not the case. In particular the fact that the bookmark is an encoding of the object type and a marker in the list of enrollments  (where the last page left off) makes it really clear. What made your comments particularly helpful - you provided context behind decision to make this change. 

As an aside, I happened to re-read the documentation on Pagination in the API documentation, in which they admonish that  index tokens in the links should be regarded as "opaque" - I vaguely remember reading that but came to ignore it when I realized that page numbers could be used to to get a range of URLs for parallel retrieval. I suppose  there is no guarantee that cardinal page numbers will remain (or any other information that we may leverage)   Also thanks for the pointers to Canvas source code. Will need to look at the source code for the Grade Change Log API since that is another topic I am interested in. As an explanation of why the mangled decoding of my bookmark: i had tweaked the bookmark  thinking (incorrectly) that it would mask any institutional information. Will manipulate so no user info is contained in my post.  

Mike