To Our Amazing Educators Everywhere,
Happy Teacher Appreciation Week!
Found this content helpful? Log in or sign up to leave a like!
Hi all,
I am an admin and Python novice. I am trying to create a script that will pull a user's pageviews within a date range and save that to a CSV file. To date, I have only be able to access the first page of results. I can't seem to get the script to navigate through subsequent pages.
I have seen posts with example script snippets (here, here, and here) but am having a difficult time implementing those strategies. I have attached two example files, one using the Requests module and one using the CanvasAPI module.
Can someone please look at my scripts and point out what I am missing?
Thank you in advance for any assistance.
Hi @rexj -- Are you sure that there are more than 100 records for the person you are searching for? I ran your script successfully within minimal changes (just changing the base url, access token and student id) for a date range of an entire semester and pulled down 13,756 rows for a student. This was the "pageviews-requests-2a.TXT" script.
Thanks @mclark19 ! I am glad to hear that the script worked for you. At least I know the script works for someone. 😉
Yes, I pulled the pageviews for a user for a single date (2/18/25) from the user account details page, downloading the CSV. There are 280 pageviews in that CSV. I am not sure why I seem to only be able to pull 100 through the API.
Do you have any thoughts about the cause of the discrepancy?
Hi @rexj I had a trailing slash on mine (e.g., https://YOUR-INSTANCE.instructure.com/api/v1/), but the same other than that. Are you seeing more than one url being output? If you reduce your "per_page" to 10, do you only get 10 results?
The only thing I could think of might be a timing issue due to converting the timezone, but that doesn't make sense for missing so many records (unless the student is only active at a particular time).
Thank you.
If I change the per_page parameter less than or equal to 100, then I receive that many results. Changing it to more than 100 doesn't yield any more results. I think that 100 is the maximum number of links in a Pageview item.
I am going to try just passing the date without the time to see if that makes a difference.
@mclark19 How was your base URL formatted? Was it the same as mine?
Hi @rexj,
I was able to run the 2a version of your code and get 750+ results for myself during a 1-week period in Jan, which seems to match everything else I see, and matches results from code I wrote myself pretty quickly using some functions I've written for other projects in the past.
It's very hard to explain why it's working for other people but not you. I guess the biggest clue is that you stated if you change your per_page to 10, you only get 10 results. That indicates for some reason in your environment, the pagination isn't working. Maybe you could add some additional debug print lines in that area just to see what exactly is executing and perhaps go from there?
-Chris
Thanks @chriscas
I added a debug print:
def get_pageviews(user_id, start_date, end_date):
url = f'{base_url}/users/{user_id}/page_views'
params = {
'start_time': start_date_utc,
'end_time': end_date_utc,
'per_page': 100
}
pageviews = []
while url:
try:
print(f"Request URL: {url}") # Debug print
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
data = response.json()
pageviews.extend(data)
if 'next' in response.links:
url = response.links['next']['url']
params = None # Clear params for subsequent requests
else:
url = None
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
break
return pageviews
After running the script I receive the following:
Request URL: https://lawrence.instructure.com/api/v1/users/3311/page_views
Request URL: https://lawrence.instructure.com/api/v1/users/3311/page_views?end_time=2025-02-19T05%3A59%3A59%2B00%...
The results have been written to pageviews.csv
Some questions:
Thank you.
Hi @rexj,
I usually send the parameters for GET calls as querystrings, which is really how the 'next' URLs work. With that being said, you're correct subsequent next page calls shouldn't need any extra parameters added, just use the next url as given back by the previous call. I'd suggest the revisions below (just editing by hand without testing):
def get_pageviews(user_id, start_date, end_date):
url = f'{base_url}/users/{user_id}/page_views?start_time=start_date_utc&end_time=end_date_utc&per_page=100'
pageviews = []
while url:
try:
print(f"Request URL: {url}") # Debug print
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
pageviews.extend(data)
if 'next' in response.links:
url = response.links['next']['url']
else:
url = None
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
break
return pageviews
I'm still a bit perplexed by the fact that your original 2a code did run just fine for an and @mclark19, yet it apparently doesn't run correctly for you. Something strange is definitely going on, but let us know if my suggested code here makes any difference for you.
I can also make a public version of the code I developed for this, which is really similar to yours but has a bunch of extra error checking for the Canvas environment, which adds some perhaps unneeded complexity for your smaller project.
-Chris
-Chris
Thanks @chriscas
I upgraded requests. I was running 2.31.0 and upgraded to 2.32.3.
I ran the code with your update and received:
PS C:\Users\rexj-a\Desktop\canvasapi> python pageviews-requests-2a.py
Request URL: https://lawrence.instructure.com/api/v1/users/3311/page_views?start_time=start_date_utc&end_time=end...
An error occurred: 503 Server Error: Service Unavailable for url: https://lawrence.instructure.com/api/v1/users/3311/page_views?start_time=start_date_utc&end_time=end...
The results have been written to pageviews.csv
I noticed that the Request URL includes the start and end date variable but not the explicit UTC date/time. I added the curly brackets around those variables in the url f statement and things worked better.
I still only received 100 pageviews.
My supervisor wondered if it could be due to my user generated API access key, and so I tried creating a Developer key. That did not work, resulting in a 401 permissions error.
Thanks for your help in trying to sort this out.
Hi @rexj,
Whoops, I knew I'd miss something editing by hand... I forgot the curly braces around the variables in the url.
def get_pageviews(user_id, start_date, end_date):
url = f'{base_url}/users/{user_id}/page_views?start_time={start_date_utc}&end_time={end_date_utc}&per_page=100'
pageviews = []
while url:
try:
print(f"Request URL: {url}") # Debug print
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
pageviews.extend(data)
if 'next' in response.links:
url = response.links['next']['url']
else:
url = None
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
break
return pageviews
What version of Python itself are you running, just in case that's an issue? Trying to use a developer key will definitely complicate things. I usually just run my own scripts with a self-generated API token, though I am a full admin so I can access anything with that. If you have a more limited role, some things may not work at all, but pagination should not be affected.
I was just copying the code you had without making too many modifications, but I notice you're passing start_date and end_date to your function, but using start_date_utc and end_date_utc in your function, which are defined outside. You may want to do some code cleanup on that.
See if the above code improves anything for you (again, apologies for the issue).
-Chris
Hi @rexj,
If you want to try my version just to see if it produces anything different for you, I'm attaching it here. It should prompt you for all required info, or you can set up your canvas environment info in the file itself. This is using pieces of validation code I made for other more complex projects, which I know makes things more complex but is sometimes handy to find and catch weird input errors.
-Chris
Thanks for this @chriscas
I tried using it for our test instance. I realized that there were a few modules I needed to install. That led me down a trail and now I am looking at re-installing Python and all packages to start fresh. I am also interested in getting the PyCharm IDE running on my machine by have run into issues with that. More that you want to know I am sure, but I am committed to getting this running. I will let you know where things end up.
I was working through this same problem today. I had previously made a Python script 2 years ago for this same purpose and the pagination worked fine then. Today I was experiencing the same issue you were.
I think it may be a permissions issue with your API token. When I ran the script to pull up my own page views, it worked as expected returning thousand of rows for the given time frame. When I try to run it for another user, I would only get the first 100 records.
If I remember correctly - I was a full Canvas Admin during our migration process, which was around the same time I originally created the script. Now I have slightly less admin privileges.
Writing through this post reminded me that there is an "as_user_id" parameter you can send with your request. Adding this parameter to my script let me get the additional records you would expect. This does require you have the "act as" permission.
I hope this helps and thank you for helping me figure out stuff on my end!
-Michael
Thank you @mmdones . i appreciate your recommendation.
I tried adding the as_user_id with Canvas ID to the call and still only received 100 results.
I am a root admin. Others outside our institution have run this script successfully and received more than one page of results. I don't know why it seems to work for others and not me.
Trying your 2a script in your original post - I had only received 100 records. When I added the
Thanks @mmdones
I am using Python 3.11.
This is the request URL generated by my script:
https://[domain].instructure.com/api/v1/users/[user_id] /page_views?as_user_id:[user_id}&start_time=2025-02-18T06:00:00+00:00&end_time=2025-02-19T05:59:59+00:00&per_page=100.
Is this formatted correctly?
This resulted in only 100 results.
Thanks for sharing your script. I will give it a try.
@mmdones Questions about your script:
1. Yes, the user.csv is just in a folder called "users_page". You can point to any file location. If you are using the file path address (C:\Users\......\users_page\user.csv), I have found that I need to change the first \ to "C:/Users\".
2. That is just how I taught myself to create files originally. There are better options, like the version in your script. I chose TSV because sometimes there are interesting characters in our page titles that will throw an error when trying to write to a CSV with a comma delimiter. The character display fine on my Mac, but my Windows machines does not like them.
3. I think I am sending local time, I really only change the date when asked to pull page views.
Follow-up on point 3:
My understanding is that all time stamps in Canvas are UTC. How does this affect your results?
I am only asked to run this script on occasions when a team lead or provost asks for the information, which happens once or twice a semester. As long as the start and end date encompass they range they are asking for. So I might extend the date range a day or two. Alternatively you could just change the time to be midnight of the start/end day?
I don't know if it's strictly required when sending data, but since Canvas always returns time as ISO8601 UTC zulu strings (Canvas LMS REST API Documentation), so that's always the format I send in my requests.
-Chris
The link I am generating is has as_user_id= instead of as_user_id:[user_id}
so the request url would be:
Hi @mmdones
I created a user.csv file in Excel and was getting an error. I edited the file in a text editor adding the comma delimiter. That worked better.
After some testing, I found that the script appears to complete correctly, but the tsv file is empty.
I have entered the time as UTC time:
start_time = ("2025-02-18 06:00:00")
end_time = "2025-02-19 05:59:59"
This time should translate to 20250218 00:00:00 through 20250218 23:59:59 local time (CST).
Related to the as_user_id: When I use it as you have it in the code, it loads the student's ID in both locations. When I do that in a browser I receive a permissions denied message.
When I have the student's ID in the first location, but mine in the act_as location, I receive a page of 100 results.
Hi @rexj,
At least for the time component, i'd recommend using the same ISO8601 zulu format Canvas returns, as that always seems to work for me. Instead of "2025-02-18 06:00:00" you'd want to send "2025-02-18T06:00:00Z" and instead of "2025-02-19 05:59:59" you'd use "2025-02-19T05:59:59Z".
-Chris
After some testing, I found that the script appears to complete correctly, but the tsv file is empty.
Are you able to print anything to the TSV? Can you try:
print(repsonse.text) or print(response.headers["Link"])
this should give you the json text or the header links for pagination.
Related to the as_user_id: When I use it as you have it in the code, it loads the student's ID in both locations. When I do that in a browser I receive a permissions denied message.
When I have the student's ID in the first location, but mine in the act_as location, I receive a page of 100 results.
Yes, when you use the student's id in both field, it is as if you are using the masquerade function acting as that user. When I do not include the student ID in the act_as location, I am not able to return the next page in the header links/pagination. It will only return first, current, and last.
Is the user you are trying to act as another admin? I receive an error when I try to use another admin with more permissions than me.
@chriscas @mmdones - I really appreciate the help you have offered up to this point, including sharing your code. I haven't been able to get any to work. Are either of you open to a Zoom call to troubleshoot your respective code or my original code?
I am in Central Standard Time, U.S. I know time is precious. No worries if you can't. Thanks.
Hi @rexj,
If you're available o n Thursday at 2pm eastern (so I think 1pm your time), I'd encourage you to join the Instructure Community Developers Group - May 2025 Meetup. There is usually a decent size crowd of folks there, and this would be a good discussion/troubleshooting topic (I think I already brought the pagination issue up last month but I couldn't demo the issue since my code seems to work reliably for me in my instance).
-Chris
Hi @rexj,
If you're available o n Thursday at 2pm eastern (so I think 1pm your time), I'd encourage you to join the Instructure Community Developers Group - May 2025 Meetup. There is usually a decent size crowd of folks there, and this would be a good discussion/troubleshooting topic (I think I already brought the pagination issue up last month but I couldn't demo the issue since my code seems to work reliably for me in my instance).
-Chris
Thanks Chris!
I will join the group and the discussion. I have a meeting that day until 1:30, but will join as soon as I can.
Jedidiah
@rexj if are you not able to get it working after the Developer group meeting maybe we can set up a zoom call Friday or early next week. I am in CST as well, just let me know. thanks
Thanks @mmdones !
To participate in the Instructure Community, you need to sign up or log in:
Sign In