cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Surveyor II

Handling Pagination

Since there are so many of us who use the APIs with a variety of languages and or libraries I am curious how everyone handles pagination with large GET requests? It seems that this has been a hurdle for many of us to conquer before we can fully utilize the Canvas API. This could be a very interesting discussion as we all handle the same problem in very different ways.

So I'll start...

I am partial to Python so I use the Requests library: http://docs.python-requests.org/en/latest/ to make working with the APIs extremely simple. Requests is a big part of what makes it easy to handle the pagination.

I start of by declaring a data_set list where each of the JSON objects will reside.

data_set = []

I then perform my initial API request and set the per_page limit to the max of 50.

I then save the response body to a variable called raw and use the built-in json() function to make it easier to work with.

uri = 'https://abc.instructure.com/api/v1/courses/12345/quizzes/67890/questions?per_page=50'

r = request.get(uri, headers=headers)

raw = r.json()

I then loop through the responses and pull out each individual response and append them to the data_set list.

for question in raw:

    data_set.append(question)

For the next pages I use a while loop to repeat the above process using the links provided from the link headers of the response. As long as the current url does not equal the last url it will perform another request but using the next url as the uri to bet sent.

while r.links['current']['url'] != r.links['last']['url']:

    r = requests.get(r.links['next']['url'], headers=headers)

    raw = r.json()

    for question in raw:

        data_set.append(question)

The loop stops when they do equal each other as that denotes that all requests have been completed and there are none left, which means you have the entire data set.

You then can work with the data_set list to pull out the needed information. With some APIs this method may have to be modified slightly to accommodate how response data is returned depending on the API. This also may not be the best method as it stores the data in memory and there may be a possibility that the system could run out of memory or preform slowly, but I have not ran into a memory limit.

Labels (1)
59 Replies
Highlighted
Navigator

I'm using either PERL or PHP without any REST clients, just LWP or cURL, so your code makes this look elegant. Having discovered how to get the information by looking at the headers that were returned and scouring tons (electrons are heavy) of web pages, here's what I do:

  1. Get the first page and check for success before I do anything. Every now and then Canvas will fail to return an error.
  2. Concatenate the body text to a variable that is originally null
  3. Look for the presence of the a Link: header and grab the next link.
  4. If there is a next link, then fetch the next page, starting at step 2.
  5. When done, look for the end and start of an array ][ without a comma between them and replace it by a comma. This makes the body into one large array, rather than a bunch.
  6. JSON decode the body into an array if the header type warrants it.

I've discovered that Canvas doesn't return a next link when it's on the last page, so that's how I decide when I'm done.

I know that storing the entire body into one string is memory inefficient. It was developed when I was first starting out and was more interested in using the results. I've begun a process to convert the code into something similar to yours where you keep appending the results to an array as its fetched and only storing the current page. I haven't been able to test it though, and was worried about losing any indexes that might be there (I haven't had time to examine all output returned to see if it ever comes through with nesting or something other than a numeric index).

Since we're talking about pagination, I check for Links on anything that isn't a POST. I wasn't sure if PUT, DELETE, or HEAD ever returned pagination, but I was sure that a POST didn't.

The process is inefficient and developed by trial & error, but it works. I'm thankful your post as a way to see how clean it could be if I'd adopt a library.

Highlighted
Learner II

I am also a Perl guy.  Here's my technique

In my main body I do a loop as follows to iterate over the PAI calls until a 'next' URL is no found

$nextURL = "";

do{

        if($nextURL eq ""){

                $getURL = Common::buildAPICommand($mode,"/api/v1/courses/${opt_c}/assignments/${opt_a}/submissions");

        }else{

                $getURL = $nextURL;

                $nextURL = "";

        }

        %response = Common::canvasAPI2($getURL, "GET");

        $json_text = $response{'json_text'};

        $nextURL = $response{'next'};

} until($nextURL eq "");

My first helper function just decides whether to append 'test' or not to the URL.   The magic happens in canvasAPI2, which calls the URL and formats the results into a hash %response.

sub canvasAPI2

{

# Returns a hash containing the json_text and the next link from the header

# This hash is called response

# response{json_text}

# response{next}

# response{prev}

# response{first}

# response{last}

# response{current}

#

  use LWP::UserAgent;

  use JSON -support_by_pp;

  ($CMD, $MODE) = @_;

  # MODE defaults to GET.   Can be GET, POST, DELETE

  my %response = {};

  my $json = new JSON;

  my $attempts = 0;

  my $json_text = -1;

  my $ua = LWP::UserAgent->new;

  my $key  = getCanvasKey();    #13~a7efq83rhq98feadhf823hffuh

  $ua->default_header("Authorization" => "Bearer $key");

  while($attempts <= 3){

    $attempts++;

    $success=0;

    my $res;

    if($MODE eq "DELETE"){

      $res = $ua->delete( $CMD);

    }elsif($MODE eq "POST"){

      $res = $ua->post( $CMD);

    }else{

      $res = $ua->get( $CMD);

    }

    $content = $res->decoded_content;

    $links$res->header("LINK");

    chomp($links);

    foreach $link ( split /,/, $links){

      ($url,$rel) = split /;/, $link;

      ($x, $loc) = split /=/, $rel;

      $loc =~ s/"//g;

      $url =~ s/[<>]//g;

      $response{"$loc"} = $url;

    }

    try{

      $json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);

      $success=1;

    }catch{

      if($attempts == 3){print "Catch Error $_\n"; print "$CMD\n"; exit;}

    };

    if($success){last;}

  }

  if($DEBUG){

    $pretty_printed = $json->pretty->encode( $json_text );

    print "$pretty_printed\n";

  }

  $response{"json_text"} = $json_text;

  return %response;

}

Highlighted
Learner II

tyler.clair@usu.edu

How did you markup the code in your question?   Is there a Jive document that shows all the options available?

Thanks, Glen

Highlighted

Hi Glen,

I don't know the answer either but here is what it looks like in HTML view:

<p>I start of by declaring a data_set list where each of the JSON objects will reside.</p>

<p></p>

<pre class="jive_text_macro jive_macro_code " jivemacro="code" ___default_attr="python" data-renderedposition="218_8_1232_16">

<p>data_set = []</p>

</pre>

<p></p>

<p>I then perform my initial API request and set the per_page limit to the max of 50.</p>

Highlighted

Hi Glen and Scott,

You have to switch to the advanced editor by clicking the Use advanced editor link on the top right of the editor.

adv-editor-link.png

After you have switched paste your code in and select it then click insert button (1).

Hover over Syntax Highlighting (2).

Then select the language (3).

syntax-highlighting.png

It looks like Perl will need to be added to the list of syntax highlighting but I think for now you could use the plain format to at least preserve indentation.

Highlighted
Navigator

glparker@usf.edu​ : tyler.clair@usu.edu beat me to the explanation, so I deleted mine. Two heads-up. Advanced editor wasn't available in the summary view you get from the inbox or activity pages; you need to go into the full discussion. The second is that I've twice used it where it didn't show all of the code that I pasted and I had to go back and hit enter a few times at the bottom to get the code to show. It was there, it just didn't show. In the code below, which is part of my API.pm module.

This code is either more or less developed than my PHP code depending on your perspective.  More developed because I have schema checking in PERL to make sure that I only pass valid parameters to the API. Less developed because the PHP code handles array parameters better and allows you to specify which parameters are SISable. Both of my code bases were set up to allow me to copy/paste the API call from the API Documentation and then do variable substitution on them. That variable substitution is done before the call to this subroutine.

sub _api {

  my $self = shift;

  my $cmd  = shift;

  my $opt  = shift;

  my $ua;

  $self->_check_init();

  if ( !defined($cmd) ) {

    return;

  }

  my $method;

  my $url;

  if ( $cmd =~ m{\A (GET|PUT|DELETE|HEAD|POST) [ ] (.*)$}xms ) {

    $method = uc($1);

    $url    = $self->{'_api'} . $2;

  }

  else {

    return;

  }

  if ( !defined($ua) ) {

    $ua = LWP::UserAgent->new;

    $ua->default_header( 'Authorization' => 'Bearer ' . $self->{'_token'} );

  }

  foreach my $key ( keys( %{$opt} ) ) {

    if ( !defined( $opt->{$key} ) ) {

      delete $opt->{$key};

    }

  }

  my $response;

  my $content = '';

  if ( $method =~ m{\A (POST) \z}xms ) {

    if ( $method eq 'POST' ) {

      if ( defined( $opt->{'attachment'} ) ) {

        $response =

          $ua->request( POST $url, $opt, 'Content-Type' => 'form-data' );

      }

      else {

        $response = $ua->request( POST $url, $opt );

      }

    }

    if ( $response->is_success

      && $response->content_type() eq 'application/json' )

    {

      $content = $response->content();

    }

  }

  else {

    my $fetch = 1;

    my $i     = 0;

    while ($fetch) {

      $fetch = 0;

      $i++;

      my $uri = URI->new($url);

      if ( defined($opt) ) {

        $uri->query_form($opt);

      }

      if ( $method eq 'GET' ) {

        $response = $ua->request( GET $uri);

      }

      if ( $method eq 'PUT' ) {

        $response = $ua->request( PUT $uri);

      }

      if ( $method eq 'DELETE' ) {

        $response = $ua->request( DELETE $uri);

      }

      if ( $method eq 'HEAD' ) {

        $response = $ua->request( HEAD $uri);

      }

      if ( $response->is_success

        && $response->content_type() eq 'application/json' )

      {

        $content .= $response->content();

        if ( my $linktext = $response->header('Link') ) {

          my @links = split(',', $linktext);

          foreach my $link (@links) {

            if ( $link =~ m{\A[<](.*?)[>][;][ ]rel="next"\z}xms ) {

              if ( $link =~ m{page=([0-9]+)&per_page=([0-9]+)}xms ) {

                $opt->{'page'}     = $1;

                $opt->{'per_page'} = $2;

                $fetch             = 1;

              }

              else {

                print("\n\nCannot match link $link\n\n");

              }

            }

          }

        }

      }

    }

    $content =~ s{\]\[}{,}xmsg;

  }

  my $json;

  if ( $content ne '' ) {

    $json = from_json($content);

  }

  else {

    print Dumper $response;

    croak;

  }

  return $json;

}

Highlighted
Learner II

Here's code I used in a presentation I made at InstructureCon 2013:

#######################################################

# published_courses.rb - a Ruby script for retrieving #

# all the ids for all courses                          #

# in the published state for a given sub-account on  #

# Instructure Canvas.                                #

# Dependencies: - Ruby V. 1.9 or greater              #

#              RUBY GEMS:                            #

#              - rest-client : Ruby REST library    #

#              - yaml : For reading YAML config file #

#              - cgi  : For retrieving URL GET vars  #

#              - uri  : For parsing URL GET vars    #

#              - json : For parsing JSON objects    #

#                        returned from API calls      #

#                                                    #

#        Jeffrey Anderson, M.Ed                      #

#        Multimedia Programmer Analyst                #

#        Center for Teaching and Learning            #

#        Mesa Community College                      #

#        jeffrey.anderson@mesacc.edu                  #

#        Maricopa County Community College District  #

#            http://ctl.mesacc.edu                  #

#            http://www.mesacc.edu                  #

#            http://www.maricopa.edu                #

#                                                    #

#######################################################

#importing required ruby gem libraries

require 'rest_client'

require 'json'

require 'yaml'

require 'cgi'

require 'uri'

#config file containing all necessary configuration options

config = nil

#the array of JSON objects returned from the Courses API calls

$courses_json_arr = []

#the array of course ids that will be used to retrieve the teacher enrollment data

$course_ids = []

#the array of enrollment API call urls used to get the teacher enrollment data

#$enrollments_calls = []

#Instructor emails array

#$instructor_emails = []

#Instructor names array

#$instructor_names = []

#sub-account's name

#$subaccount_name

#Load and parse the configuration YAML file. It must be called "config.yaml"

parsed = begin

  config = YAML.load(File.open("courses_config.yaml"))

rescue Exception => e

  puts "Could not parse config YAML: #{e.message}"

end

auth_token = "Bearer " + config["api_key"]

subaccount_name_req = RestClient.get config["base_url"] + "/api/v1/accounts/" + config["subaccount_id"], :Authorization => auth_token

subaccount_json = JSON.parse subaccount_name_req.to_s

$subaccount_name = subaccount_json["name"]

#$subaccount_name = subaccount_json["name"]

#puts $subaccount_name

per_page = "1"

enrollment_term_id = config["enrollment_term_id"]

subaccount_url = config["base_url"] + "/api/v1/accounts/" + config["subaccount_id"] + "/courses/?page=1&per_page=" + per_page + "&enrollment_term_id=" + enrollment_term_id + "&published=false"

#puts subaccount_url

#puts auth_token

first_res = RestClient.get subaccount_url, :Authorization => auth_token

links = first_res.headers[:link]

#puts links

all_links = links.split(",")

#the last link is the 3rd in the Links header, and the actual URL is inside the first in the chunk that's returned. It looks like this:

#<https://<canvas-instance>/api/v1/accounts/:id/courses?page=<total_number_of_courses>&per_page=1>; rel="last"

#So the page parameter on the query string is the total number of courses if request that you're getting one course at a time. <total_number_of_courses>

#will contain an arbitray number depending on how many total courses there are in the sub-account. This number is important because it's how many subsequent

#times to continue to run the loop to grab the courses in the least amount of API server requests necessary

next_link = all_links[0].split(";")[0].gsub(/\<|\>/, "")

last = all_links[2].split(";")[0].gsub(/\<|\>/, "")

last_uri = URI.parse(last)

uri_parms = CGI.parse(last_uri.query)

#num_courses is currently a String variable at this point

$num_courses = uri_parms["page"][0]

$num_courses = $num_courses.to_i

puts "Number of courses found in sub-account: " + $num_courses.to_s

puts "Getting all Course Info in chunks of " + config["per_page"]    

per_page = "50"

$course_collected = 0

$courses_json_arr = []

subaccount_url = config["base_url"] + "/api/v1/accounts/" + config["subaccount_id"] + "/courses/?page=1&per_page=" + per_page + "&enrollment_term_id=" + enrollment_term_id + "&published=false"

#puts subaccount_url

while $course_collected < $num_courses

    response = RestClient.get subaccount_url, :Authorization => auth_token

    links = response.headers[:link]

    all_links = links.split(",")

    #the next link is the 1st in the Links header if there are more pages in the pagination structure, and the actual URL is inside the

    #first in the chunk that's returned. It should look like this:

    #Link: <https://<canvas-instance>/api/v1/accounts/:id/courses?page=2&per_page=50>; rel="next"

    next_link = all_links[0].split(';')[0].gsub(/\<|\>/, "")

    #puts next_link

    subaccount_url = next_link

    temp_arr = JSON.parse response.to_s

    $courses_json_arr.concat temp_arr

    #puts $courses_json_arr[0]["id"]

    #puts $courses_json_arr.length

    #puts $courses_json_arr[$courses_json_arr.length]

      $course_collected += temp_arr.length

    #puts "Next link: " + next_link

    #num_courses += per_page.to_i

    puts "Total Number of Course IDs Collected: " + $course_collected.to_s

end

$output_file = File.new("course_ids.txt", 'w')

$courses_json_arr.each { |x|

  #$course_ids.push x["id"]

  $output_file.puts x["id"]

}

puts "# of published course ids collected: #{$course_collected}"

puts "Course IDs File Saved to Disk"

$output_file.close

Here's the presentation itself:

https://www.youtube.com/watch?v=c8L-psDDpYE

Highlighted
Learner II

I recently found a python example on github that steps through pagination. It is very similar.

https://github.com/kajigga/canvas-contrib/blob/master/API_Examples/PullCourseGrades/pull_grades.py

students_endpoint = BASE_URL % '/courses/%s/students' % (course_id)

# Create a request, adding the REQUEST_HEADERS to it for authentication

not_done = True

students = []

url = students_endpoint

while not_done:

  student_request = requests.get(url,headers=REQUEST_HEADERS)

  students+=student_request.json()

  if 'next' in student_request.links.keys():

    url = student.request.links['next']['href']

  else:

    not_done = False

Highlighted
Community Member

Hi,

Sorry this reply is a bit late, but I thought it may still be helpful to folks.

We have been developing a Canvas sdk in python, it's up on github here: penzance/canvas_python_sdk · GitHub

Please feel free to check it out.

Specifically to your question about pagination. The sdk has a method "get_all_list_data" that you can use to make any Canvas api call to Canvas. This call will

iterate over the next responses until all data is returned.

Here's an example using the call to get all enrollments for a course:

canvas_enrollments = get_all_list_data(SDK_CONTEXT, enrollments.list_enrollments_courses, canvas_course_id)

I hope this is helpful!

Regards,

Eric