We currently leverage the Canvas Analytics API to query page views as a measure for engagement.
The page views are, typically, less than the counts per user from the requests table - which makes logical sense. However, the magnitude of this difference bears investigation.
Driving Questions:
- Exactly how is page views calculated?
- How can that calculation be replicated in the requests table?
- What additional "views" information is included in requests table that is "valid" user activity and how much is "noise"? (e.g., does one user action in an assignments page equate to 5 lines in the requests table for some reason)
Some additional context from Ruby P Nugent:
I am looking at page_views count. The results from Redshift doesn't match the Analytics API for the page_views value, nor what I see on the Course Analytics page in Canvas. Should I count a different thing, or add more filters to get the expected results?
Redshift query:
select u.canvas_id as "user id",
c.canvas_id as "course id",
count(r.id)
from requests r
inner join user_dim u on r.user_id = u.id
inner join course_dim c on r.course_id = c.id
inner join enrollment_fact ef on r.user_id = ef.user_id and r.course_id = ef.course_id
inner join enrollment_dim ed on ef.enrollment_id = ed.id
where c.canvas_id in ('565')
and ed.type = 'StudentEnrollment'
and ed.workflow_state <> 'deleted'
group by u.canvas_id, c.canvas_id
order by u.canvas_id
compared to Analytics API: https://umich.instructure.com//api/v1/courses/565/analytics/student_summaries
Thanks.
Hello Steve,
The requests table actually shows much more than analytics, I wouldn't be surprised if it was literally easily double, or triple the amount of requests from pageviews/analytics. There isn't a calculation method per say for Page Views, it's more page views don't get created as often. To put this in perspective, a page view is generated by multiple methods. But this is whenever the user manually goes to a page, clicks a link, etc. I.E. they physically go to an item, or click to go to another page. This is most noticeable with two types of traffic that are often missing:
1. Mobile Apps (API Requests).
2. AJAX (asynchronous) Requests.
If you've been with Canvas for a while that mobile apps don't always generate page views for every action that the PC does, this is due to the fact that it's using the API. It's not counting as the user actually going to a specific page because in reality they aren't, they're just getting data, and that's being scaffolded into a view on the mobile app. Same thing with backend async requests your browser kicks off. This is say loading more than 100 students, each 100 students is actually a different network request (due to pagination), but these only show up once in Page Views. Because it isn't the user manually clicking something, or manually going to a page. They went to the page manually, that's one page view. (Sometime pagination can kick off multiple page views, but these are very rare, and for the purpose of the example lets just say they do one). All the loading in the background doesn't register a page request as that would fill up page requests really quickly. With a lot of data that wouldn't be useful to analytics, except on a massive scale (Canvas Data). For quick analytics (like the ones built into Canvas) they would muddy things up.
However the Requests Table generates an item for every single request. Regardless if it's through the api, or an asynchronous ajax request. Every single request that gets processed gets added in the requests table. Since the Canvas Apps rely on the API, and The Browser makes heavy use of Asynchronous background requests your requests table should be much bigger than your page view requests.
You can see how it could heavily add up. Since this is the way requests work they allow for excellent tracking (the only thing that you don't have is if the app server errors out for some reason, before it gets to writing the requests table (extremely rare) which we'll have internally), and other sorts of what exact steps someone is taking through the program.
As for getting the same stats I'd recommend filtering out Mobile traffic that's done over the api completely. For example apis that contain 'api/v1'. That would get rid of almost all of the Mobile Apps API requests, and the AJAX Requests (as those usually use the API), Next I'd filter out the user agent to anything that isn't a recognized browser for example IE, Chrome, FireFox, Opera, Safari, etc.
This would probably give you the closest estimate with the most minimal amount of work. If you'd like you could test which URLs appear in page views, and let those through, etc. Though that would be considerably harder for minimal benefit.