Pagination makes endless api calls when retrieving page_views

Jump to solution
erkanererkaner
Community Novice

I have the following code to retrieve page views of the user with id=123456. The very same code works when I retrieve the entries of a user for a discussion forum. However, when I change the uri to https://learn.canvas.net/api/v1/users/123456/page_views to retrieve the page views, no matter what the page number is, it continues to return the same records over and over again. So, there is no further data maybe, but it continues to retrieve data. That is, it creates an endless while loop. I wonder if you see any problems with the code:

npagina=1

control=0

while control==0:

uri = 'https://learn.canvas.net/api/v1/users/123456/page_views?per_page=100&page=' + str(npagina)

r = requests.get(uri, headers=headers)

raw = r.json()  #if no new data, then continues to retrieve duplicate data

if raw != "":

    views = pd.DataFrame(raw)

    if 'id' in views.columns:

          npagina=npagina+1

          page_views = page_views.append(views)

    else:

          control = 1 

Labels (2)
2 Solutions
dgrobani
Community Champion

I found the Pagination section of the API documentation very helpful. Here's how I retrieve paginated data in Python:

r = get('{}?per_page=100'.format(url))
paginated = r.json()
while 'next' in r.links:
    r = get(r.links['next']['url'])
    paginated.extend(r.json())

I hope this helps.

EDITS: fixed typo; refactored

View solution in original post

James
Community Champion

Page views are treated differently than some of the other requests as they can change pretty quickly and you can get new page views added before you make the request to get the next page. That means that information that was in the first page of results might get shifted down by incoming requests and reappear in the second page of results as well.

To compensate, Canvas adds a bookmark: value to the page= parameter, rather than a specific number. Here are the results of the Link response header (I've reformatted and removed portions so more is visible).

CURRENT:
page=first&per_page=10
NEXT:
page=bookmark:WyIyMDE3LTA5LTE0VDA4OjU3OjQ5LjM2MC0wNTowMCIsIjNhOTVmMzIxLTc5Y2UtNDcyOC04N2U0LTczNDYxN2Y0ZWJhNiJd&per_page=10
FIRST
page=first&per_page=10>

Also notice that there is no rel="last" supplied here.

It is important that the second fetch contain that page=bookmark:token or it's considered a different request.


The way you're making them doesn't contain it, but using the next Link like dgrobani  recommended will grab it. That's especially true because that bookmark link changes every time you fetch more pages and so there is no way to predict where it will be the next time. There is no idea of page number for the page_views, it's all based off that bookmark.

CURRENT:
page=bookmark:WyIyMDE3LTA5LTE0VDA4OjU3OjQ5LjM2MC0wNTowMCIsIjNhOTVmMzIxLTc5Y2UtNDcyOC04N2U0LTczNDYxN2Y0ZWJhNiJd&per_page=10
NEXT:
page=bookmark:WyIyMDE3LTA5LTEzVDE4OjQ0OjQ3Ljk3MC0wNTowMCIsIjU4OWQ4YTAyLTk3OGEtNGE5ZS1hM2EwLTVkYTZhY2ZjOTQyMyJd&per_page=10
FIRST:
page=first&per_page=10

That reminds me that I need to add this to my list of way that pagination is handled in Canvas. I'm working on revising how I fetch data and grabbed the rel="last" link and then iterate over the ones between 2 and "last". That works in some cases, but it won't work here where the only way to fetch it is in series rather than in parallel.

Also, I'd watch out for the page_views API as they can go on for a really long time. Depending on what you're looking for, you might want to specify the dates in the original query or break your fetching once you reach the desired point.

Another possibility is to use Canvas Data and the requests table for most of the information and then fetch the current information that hasn't made it into Canvas Data yet from the API.

View solution in original post