Handling Pagination

Jump to solution
tyler_clair
Community Champion

Since there are so many of us who use the APIs with a variety of languages and or libraries I am curious how everyone handles pagination with large GET requests? It seems that this has been a hurdle for many of us to conquer before we can fully utilize the Canvas API. This could be a very interesting discussion as we all handle the same problem in very different ways.

So I'll start...

I am partial to Python so I use the Requests library: http://docs.python-requests.org/en/latest/ to make working with the APIs extremely simple. Requests is a big part of what makes it easy to handle the pagination.

I start of by declaring a data_set list where each of the JSON objects will reside.

data_set = []

I then perform my initial API request and set the per_page limit to the max of 50.

I then save the response body to a variable called raw and use the built-in json() function to make it easier to work with.

uri = 'https://abc.instructure.com/api/v1/courses/12345/quizzes/67890/questions?per_page=50'

r = request.get(uri, headers=headers)

raw = r.json()

I then loop through the responses and pull out each individual response and append them to the data_set list.

for question in raw:

    data_set.append(question)

For the next pages I use a while loop to repeat the above process using the links provided from the link headers of the response. As long as the current url does not equal the last url it will perform another request but using the next url as the uri to bet sent.

while r.links['current']['url'] != r.links['last']['url']:

    r = requests.get(r.links['next']['url'], headers=headers)

    raw = r.json()

    for question in raw:

        data_set.append(question)

The loop stops when they do equal each other as that denotes that all requests have been completed and there are none left, which means you have the entire data set.

You then can work with the data_set list to pull out the needed information. With some APIs this method may have to be modified slightly to accommodate how response data is returned depending on the API. This also may not be the best method as it stores the data in memory and there may be a possibility that the system could run out of memory or preform slowly, but I have not ran into a memory limit.

Labels (1)
1 Solution

I'm not getting errors testing for a different condition in my Python loop. Perhaps you could try that and see what happens?

paginated = r.json()
while 'next' in r.links:
    r = get(r.links['next']['url'])
    paginated.extend(r.json())
return paginated

View solution in original post