Since there are so many of us who use the APIs with a variety of languages and or libraries I am curious how everyone handles pagination with large GET requests? It seems that this has been a hurdle for many of us to conquer before we can fully utilize the Canvas API. This could be a very interesting discussion as we all handle the same problem in very different ways.
So I'll start...
I am partial to Python so I use the Requests library: http://docs.python-requests.org/en/latest/ to make working with the APIs extremely simple. Requests is a big part of what makes it easy to handle the pagination.
I start of by declaring a data_set list where each of the JSON objects will reside.
data_set = []
I then perform my initial API request and set the per_page limit to the max of 50.
I then save the response body to a variable called raw and use the built-in json() function to make it easier to work with.
uri = 'https://abc.instructure.com/api/v1/courses/12345/quizzes/67890/questions?per_page=50'
r = request.get(uri, headers=headers)
raw = r.json()
I then loop through the responses and pull out each individual response and append them to the data_set list.
for question in raw:
data_set.append(question)
For the next pages I use a while loop to repeat the above process using the links provided from the link headers of the response. As long as the current url does not equal the last url it will perform another request but using the next url as the uri to bet sent.
while r.links['current']['url'] != r.links['last']['url']:
r = requests.get(r.links['next']['url'], headers=headers)
raw = r.json()
for question in raw:
data_set.append(question)
The loop stops when they do equal each other as that denotes that all requests have been completed and there are none left, which means you have the entire data set.
You then can work with the data_set list to pull out the needed information. With some APIs this method may have to be modified slightly to accommodate how response data is returned depending on the API. This also may not be the best method as it stores the data in memory and there may be a possibility that the system could run out of memory or preform slowly, but I have not ran into a memory limit.
I'm using either PERL or PHP without any REST clients, just LWP or cURL, so your code makes this look elegant. Having discovered how to get the information by looking at the headers that were returned and scouring tons (electrons are heavy) of web pages, here's what I do:
I've discovered that Canvas doesn't return a next link when it's on the last page, so that's how I decide when I'm done.
I know that storing the entire body into one string is memory inefficient. It was developed when I was first starting out and was more interested in using the results. I've begun a process to convert the code into something similar to yours where you keep appending the results to an array as its fetched and only storing the current page. I haven't been able to test it though, and was worried about losing any indexes that might be there (I haven't had time to examine all output returned to see if it ever comes through with nesting or something other than a numeric index).
Since we're talking about pagination, I check for Links on anything that isn't a POST. I wasn't sure if PUT, DELETE, or HEAD ever returned pagination, but I was sure that a POST didn't.
The process is inefficient and developed by trial & error, but it works. I'm thankful your post as a way to see how clean it could be if I'd adopt a library.
I am also a Perl guy. Here's my technique
In my main body I do a loop as follows to iterate over the PAI calls until a 'next' URL is no found
$nextURL = "";
do{
if($nextURL eq ""){
$getURL = Common::buildAPICommand($mode,"/api/v1/courses/${opt_c}/assignments/${opt_a}/submissions");
}else{
$getURL = $nextURL;
$nextURL = "";
}
%response = Common::canvasAPI2($getURL, "GET");
$json_text = $response{'json_text'};
$nextURL = $response{'next'};
} until($nextURL eq "");
My first helper function just decides whether to append 'test' or not to the URL. The magic happens in canvasAPI2, which calls the URL and formats the results into a hash %response.
sub canvasAPI2
{
# Returns a hash containing the json_text and the next link from the header
# This hash is called response
# response{json_text}
# response{next}
# response{prev}
# response{first}
# response{last}
# response{current}
#
use LWP::UserAgent;
use JSON -support_by_pp;
($CMD, $MODE) = @_;
# MODE defaults to GET. Can be GET, POST, DELETE
my %response = {};
my $json = new JSON;
my $attempts = 0;
my $json_text = -1;
my $ua = LWP::UserAgent->new;
my $key = getCanvasKey(); #13~a7efq83rhq98feadhf823hffuh
$ua->default_header("Authorization" => "Bearer $key");
while($attempts <= 3){
$attempts++;
$success=0;
my $res;
if($MODE eq "DELETE"){
$res = $ua->delete( $CMD);
}elsif($MODE eq "POST"){
$res = $ua->post( $CMD);
}else{
$res = $ua->get( $CMD);
}
$content = $res->decoded_content;
$links = $res->header("LINK");
chomp($links);
foreach $link ( split /,/, $links){
($url,$rel) = split /;/, $link;
($x, $loc) = split /=/, $rel;
$loc =~ s/"//g;
$url =~ s/[<>]//g;
$response{"$loc"} = $url;
}
try{
$json_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
$success=1;
}catch{
if($attempts == 3){print "Catch Error $_\n"; print "$CMD\n"; exit;}
};
if($success){last;}
}
if($DEBUG){
$pretty_printed = $json->pretty->encode( $json_text );
print "$pretty_printed\n";
}
$response{"json_text"} = $json_text;
return %response;
}
How did you markup the code in your question? Is there a Jive document that shows all the options available?
Thanks, Glen
Hi Glen,
I don't know the answer either but here is what it looks like in HTML view:
<p>I start of by declaring a data_set list where each of the JSON objects will reside.</p>
<p></p>
<pre class="jive_text_macro jive_macro_code " jivemacro="code" ___default_attr="python" data-renderedposition="218_8_1232_16">
<p>data_set = []</p>
</pre>
<p></p>
<p>I then perform my initial API request and set the per_page limit to the max of 50.</p>
Hi Glen and Scott,
You have to switch to the advanced editor by clicking the Use advanced editor link on the top right of the editor.
After you have switched paste your code in and select it then click insert button (1).
Hover over Syntax Highlighting (2).
Then select the language (3).
It looks like Perl will need to be added to the list of syntax highlighting but I think for now you could use the plain format to at least preserve indentation.
glparker@usf.edu : tyler.clair@usu.edu beat me to the explanation, so I deleted mine. Two heads-up. Advanced editor wasn't available in the summary view you get from the inbox or activity pages; you need to go into the full discussion. The second is that I've twice used it where it didn't show all of the code that I pasted and I had to go back and hit enter a few times at the bottom to get the code to show. It was there, it just didn't show. In the code below, which is part of my API.pm module.
This code is either more or less developed than my PHP code depending on your perspective. More developed because I have schema checking in PERL to make sure that I only pass valid parameters to the API. Less developed because the PHP code handles array parameters better and allows you to specify which parameters are SISable. Both of my code bases were set up to allow me to copy/paste the API call from the API Documentation and then do variable substitution on them. That variable substitution is done before the call to this subroutine.
sub _api {
my $self = shift;
my $cmd = shift;
my $opt = shift;
my $ua;
$self->_check_init();
if ( !defined($cmd) ) {
return;
}
my $method;
my $url;
if ( $cmd =~ m{\A (GET|PUT|DELETE|HEAD|POST) [ ] (.*)$}xms ) {
$method = uc($1);
$url = $self->{'_api'} . $2;
}
else {
return;
}
if ( !defined($ua) ) {
$ua = LWP::UserAgent->new;
$ua->default_header( 'Authorization' => 'Bearer ' . $self->{'_token'} );
}
foreach my $key ( keys( %{$opt} ) ) {
if ( !defined( $opt->{$key} ) ) {
delete $opt->{$key};
}
}
my $response;
my $content = '';
if ( $method =~ m{\A (POST) \z}xms ) {
if ( $method eq 'POST' ) {
if ( defined( $opt->{'attachment'} ) ) {
$response =
$ua->request( POST $url, $opt, 'Content-Type' => 'form-data' );
}
else {
$response = $ua->request( POST $url, $opt );
}
}
if ( $response->is_success
&& $response->content_type() eq 'application/json' )
{
$content = $response->content();
}
}
else {
my $fetch = 1;
my $i = 0;
while ($fetch) {
$fetch = 0;
$i++;
my $uri = URI->new($url);
if ( defined($opt) ) {
$uri->query_form($opt);
}
if ( $method eq 'GET' ) {
$response = $ua->request( GET $uri);
}
if ( $method eq 'PUT' ) {
$response = $ua->request( PUT $uri);
}
if ( $method eq 'DELETE' ) {
$response = $ua->request( DELETE $uri);
}
if ( $method eq 'HEAD' ) {
$response = $ua->request( HEAD $uri);
}
if ( $response->is_success
&& $response->content_type() eq 'application/json' )
{
$content .= $response->content();
if ( my $linktext = $response->header('Link') ) {
my @links = split(',', $linktext);
foreach my $link (@links) {
if ( $link =~ m{\A[<](.*?)[>][;][ ]rel="next"\z}xms ) {
if ( $link =~ m{page=([0-9]+)&per_page=([0-9]+)}xms ) {
$opt->{'page'} = $1;
$opt->{'per_page'} = $2;
$fetch = 1;
}
else {
print("\n\nCannot match link $link\n\n");
}
}
}
}
}
}
$content =~ s{\]\[}{,}xmsg;
}
my $json;
if ( $content ne '' ) {
$json = from_json($content);
}
else {
print Dumper $response;
croak;
}
return $json;
}
Here's code I used in a presentation I made at InstructureCon 2013:
#######################################################
# published_courses.rb - a Ruby script for retrieving #
# all the ids for all courses #
# in the published state for a given sub-account on #
# Instructure Canvas. #
# Dependencies: - Ruby V. 1.9 or greater #
# RUBY GEMS: #
# - rest-client : Ruby REST library #
# - yaml : For reading YAML config file #
# - cgi : For retrieving URL GET vars #
# - uri : For parsing URL GET vars #
# - json : For parsing JSON objects #
# returned from API calls #
# #
# Jeffrey Anderson, M.Ed #
# Multimedia Programmer Analyst #
# Center for Teaching and Learning #
# Mesa Community College #
# jeffrey.anderson@mesacc.edu #
# Maricopa County Community College District #
# #
#######################################################
#importing required ruby gem libraries
require 'rest_client'
require 'json'
require 'yaml'
require 'cgi'
require 'uri'
#config file containing all necessary configuration options
config = nil
#the array of JSON objects returned from the Courses API calls
$courses_json_arr = []
#the array of course ids that will be used to retrieve the teacher enrollment data
$course_ids = []
#the array of enrollment API call urls used to get the teacher enrollment data
#$enrollments_calls = []
#Instructor emails array
#$instructor_emails = []
#Instructor names array
#$instructor_names = []
#sub-account's name
#$subaccount_name
#Load and parse the configuration YAML file. It must be called "config.yaml"
parsed = begin
config = YAML.load(File.open("courses_config.yaml"))
rescue Exception => e
puts "Could not parse config YAML: #{e.message}"
end
auth_token = "Bearer " + config["api_key"]
subaccount_name_req = RestClient.get config["base_url"] + "/api/v1/accounts/" + config["subaccount_id"], :Authorization => auth_token
subaccount_json = JSON.parse subaccount_name_req.to_s
$subaccount_name = subaccount_json["name"]
#$subaccount_name = subaccount_json["name"]
#puts $subaccount_name
per_page = "1"
enrollment_term_id = config["enrollment_term_id"]
subaccount_url = config["base_url"] + "/api/v1/accounts/" + config["subaccount_id"] + "/courses/?page=1&per_page=" + per_page + "&enrollment_term_id=" + enrollment_term_id + "&published=false"
#puts subaccount_url
#puts auth_token
first_res = RestClient.get subaccount_url, :Authorization => auth_token
links = first_res.headers[:link]
#puts links
all_links = links.split(",")
#the last link is the 3rd in the Links header, and the actual URL is inside the first in the chunk that's returned. It looks like this:
#<https://<canvas-instance>/api/v1/accounts/:id/courses?page=<total_number_of_courses>&per_page=1>; rel="last"
#So the page parameter on the query string is the total number of courses if request that you're getting one course at a time. <total_number_of_courses>
#will contain an arbitray number depending on how many total courses there are in the sub-account. This number is important because it's how many subsequent
#times to continue to run the loop to grab the courses in the least amount of API server requests necessary
next_link = all_links[0].split(";")[0].gsub(/\<|\>/, "")
last = all_links[2].split(";")[0].gsub(/\<|\>/, "")
last_uri = URI.parse(last)
uri_parms = CGI.parse(last_uri.query)
#num_courses is currently a String variable at this point
$num_courses = uri_parms["page"][0]
$num_courses = $num_courses.to_i
puts "Number of courses found in sub-account: " + $num_courses.to_s
puts "Getting all Course Info in chunks of " + config["per_page"]
per_page = "50"
$course_collected = 0
$courses_json_arr = []
subaccount_url = config["base_url"] + "/api/v1/accounts/" + config["subaccount_id"] + "/courses/?page=1&per_page=" + per_page + "&enrollment_term_id=" + enrollment_term_id + "&published=false"
#puts subaccount_url
while $course_collected < $num_courses
response = RestClient.get subaccount_url, :Authorization => auth_token
links = response.headers[:link]
all_links = links.split(",")
#the next link is the 1st in the Links header if there are more pages in the pagination structure, and the actual URL is inside the
#first in the chunk that's returned. It should look like this:
#Link: <https://<canvas-instance>/api/v1/accounts/:id/courses?page=2&per_page=50>; rel="next"
next_link = all_links[0].split(';')[0].gsub(/\<|\>/, "")
#puts next_link
subaccount_url = next_link
temp_arr = JSON.parse response.to_s
$courses_json_arr.concat temp_arr
#puts $courses_json_arr[0]["id"]
#puts $courses_json_arr.length
#puts $courses_json_arr[$courses_json_arr.length]
$course_collected += temp_arr.length
#puts "Next link: " + next_link
#num_courses += per_page.to_i
puts "Total Number of Course IDs Collected: " + $course_collected.to_s
end
$output_file = File.new("course_ids.txt", 'w')
$courses_json_arr.each { |x|
#$course_ids.push x["id"]
$output_file.puts x["id"]
}
puts "# of published course ids collected: #{$course_collected}"
puts "Course IDs File Saved to Disk"
$output_file.close
Here's the presentation itself:
I recently found a python example on github that steps through pagination. It is very similar.
https://github.com/kajigga/canvas-contrib/blob/master/API_Examples/PullCourseGrades/pull_grades.py
students_endpoint = BASE_URL % '/courses/%s/students' % (course_id)
# Create a request, adding the REQUEST_HEADERS to it for authentication
not_done = True
students = []
url = students_endpoint
while not_done:
student_request = requests.get(url,headers=REQUEST_HEADERS)
students+=student_request.json()
if 'next' in student_request.links.keys():
url = student.request.links['next']['href']
else:
not_done = False
Hi,
Sorry this reply is a bit late, but I thought it may still be helpful to folks.
We have been developing a Canvas sdk in python, it's up on github here: penzance/canvas_python_sdk · GitHub
Please feel free to check it out.
Specifically to your question about pagination. The sdk has a method "get_all_list_data" that you can use to make any Canvas api call to Canvas. This call will
iterate over the next responses until all data is returned.
Here's an example using the call to get all enrollments for a course:
canvas_enrollments = get_all_list_data(SDK_CONTEXT, enrollments.list_enrollments_courses, canvas_course_id)
I hope this is helpful!
Regards,
Eric