cancel
Showing results for 
Search instead for 
Did you mean: 
audra_agnelly
Community Champion

Canvas Data Page Views

We're looking at some metrics of user activity, one of which is page views. The wiki_page_fact table has a measure called 'view_count'. I assume this is the number of times a page has been viewed but the numbers I'm pulling from Canvas Data are vastly different from Google Analytics data and the page views on a course's analytics page. For example, using Canvas Data, the total number of view_counts's for all published pages in one course adds up to 436,000, while using the data in that course's analytics page, I get a page view total of 68,000. I thought perhaps this was tie to the requests, so I looked at the requests table and calculated the total number of requests made to the course and got 330,000 which is closer to the view_count total but still falls short and it tabulating items other than Canvas pages. What is the view_count value in Canvas Data counting, which value is an accurate reflection of total page views, and what's the best way to get total page view?

8 Replies
robotcars
Community Champion

Audra,

Just curious before I go further. Are you specifically trying to determine the page views of a courses's wiki pages or any area (including modules, assignments, grades, discussions) within the course?

Just for fun I'm going to shuffle in some numbers from a course

For example, using Canvas Data, the total number of view_counts's for all published pages in one course adds up to 436,000

      [ wiki_page_dim; with 76 students, 18 pages; view_count = 5167 ]

...while using the data in that course's analytics page, I get a page view total of 68,000

      [ using the api call below; 137,632 views ]

var course_id = 1234567,
     total = 0;
$.getJSON('/api/v1/courses/'+ course_id +'/analytics/activity', function(r) {
     for (var i in r) {
          total = total + r[i].views
     }
     console.log(total)
})‍‍‍‍‍‍‍‍

...so I looked at the requests table and calculated the total number of requests made to the course and got 330,000

      [ requests; 255,220* views ]

* I run daily cleanup tasks to reduce the overflow of automated-user requests, see below (1).

What I don't know, is how Canvas calculates the view_count in Canvas Data wiki_page_fact. It seems likely it's a transactional database count when the server returns the page to a user.

The API/analytics page_views, probably behaves a bit more like Google Analytics in that a 'pageview' is each time a user views the page.

What I do know (kinda). The requests table is not a transactional table, it is comprised of multiple sources, include web server logs, therefore each row cannot be directly tied to a user action. Records in the requests table happen for user clicks, api calls, LTI integrations, and more. In my work in the requests table I found that thousands of rows can be generated for users where Canvas or its sub-services are performing requests for the user, these include 'ping' requests where I believe Canvas is keeping the users session alive.

Another instance includes my attempts to geo-locate students who roam. I have several use cases where I was able to track users as they traveled, but other cases where users appeared to be physically located in multiple locations at once, and while I have a use case where this actually happened, in most cases it appears cloud computing centers and distributed services were responsible. Also, here in Las Vegas, it seems most AT&T mobile users get an IP bounced out of Southern California, so I have lots of users who seem to be in Las Vegas and LA at the same time.

(1) I have shared some SQL on the community (also with Mr. Taylor at Howard County) something similar what's posted here https://community.canvaslms.com/thread/10993#comment-42118

I would love for Canvas to elaborate on the differences, and/or provide a definition of what records in the requests table are actually user generated.

audra_agnelly
Community Champion

Thanks for the follow up carroll-ccsd‌.  We were trying to determine if course announcements were driving course activity in our elementary schools where Canvas is primarily a communication tool, or if parents are just viewing announcements through email and not engaging with the actual Canvas course. I finally figured out that my query was returning a view_count that was a multiple of the number of course announcements, so now I have numbers that make much more sense, so I do think think it's as you suggest a transactional count when the user loads the page. Just looking at the course statistics page, I think the Pageview in the API call probably doesn't strictly mean pages in terms of Canvas pages, but is counting all URLs the user view: discussions, assignments, etc. looking at the numbers again, the view_count total for the wiki pages in the course is not 6,858 while the analytics API returns a total of 71,030 which is closer to my Google Analytics total of 50,000. 

Audra,

Trying to accurately pull page views from Canvas Data has been an ongoing challenge for me, I appreciate the discussion. Maybe we'll find some understanding in the data. What database are you using? I would like to share some queries in Microsoft SQL, maybe they can help.

I found the following page_view.js is part of Canvas. It increments seconds and stores values to a cookie before the page is unloaded. On the next page, it appears Canvas logs the time on the previous page. This is probably how they get the users time spent, and they only seem to be recording during mouse and keyboard input. I wish these counters could be part of Canvas Data.

/*
* Copyright (C) 2011 - present Instructure, Inc.
*
* This file is part of Canvas.
*
* Canvas is free software: you can redistribute it and/or modify it under
* the terms of the GNU Affero General Public License as published by the Free
* Software Foundation, version 3 of the License.
*
* Canvas is distributed in the hope that it will be useful, but WITHOUT ANY
* WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
* A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
* details.
*
* You should have received a copy of the GNU Affero General Public License along
* with this program. If not, see <http://www.gnu.org/licenses/>.
*/


import INST from './INST'
import $ from 'jquery'
import './jquery.ajaxJSON'

  $(document).ready(function(){
    var interactionSeconds = 0,
        update_url = window.ENV.page_view_update_url;
        eventInTime = false;

    INST.interaction_contexts = {};

    if(document.cookie && document.cookie.match(/last_page_view/)) {
      var match = document.cookie.match(/last_page_view=([^;]+)/);
      if(match && match[1]) {
        try {
          var data = $.parseJSON(unescape(match[1]));
          if(data && data.url && data.seconds) {
            setTimeout(function() {
              $.ajaxJSON(data.url, "PUT", {interaction_seconds: data.seconds}, function() {
              }, function() {}, 3000);
            });
          }
        } catch(e) {
        }
      }
      document.cookie = "last_page_view=; Path=/; expires=Thu, 01-Jan-1970 00:00:01 GMT";
    }

    if (update_url) {
      var secondsSinceLastEvent = 0;
      var intervalInSeconds = 60 * 5;

      $(document).bind('page_view_update_url_received', function(event, new_update_url) {
        update_url = new_update_url;
      });

      var updateTrigger;
      $(document).bind('page_view_update', function(event, force) {
        var data = {};

        if(force || (interactionSeconds > 10 && secondsSinceLastEvent < intervalInSeconds)) {
          data.interaction_seconds = interactionSeconds;
          $.ajaxJSON(update_url, "PUT", data, null, function(result, xhr) {
            if(xhr.status === 422) {
              clearInterval(updateTrigger);
            }
          });
          interactionSeconds = 0;
        }
      });

      updateTrigger = setInterval(function() {
        $(document).triggerHandler('page_view_update');
      }, 1000 * intervalInSeconds);

      window.addEventListener('beforeunload', function(e) {
        if(interactionSeconds > 30) {
          var value = JSON.stringify({url: update_url, seconds: interactionSeconds});
          document.cookie = "last_page_view=" + escape(value) + "; Path=/;";
        }
      });

      var eventInTime = false;
      $(document).bind('mousemove keypress mousedown focus', function() {
        eventInTime = true;
      });
      setInterval(function() {
        if(eventInTime) {
          interactionSeconds++;
          if(INST && INST.interaction_context && INST.interaction_contexts) {
            INST.interaction_contexts[INST.interaction_context] = (INST.interaction_contexts[INST.interaction_context] || 0) + 1;
          }
          eventInTime = false;
          if(secondsSinceLastEvent >= intervalInSeconds) {
            secondsSinceLastEvent = 0;
            $(document).triggerHandler('page_view_update');
          }
          secondsSinceLastEvent = 0;
        } else {
          secondsSinceLastEvent++;
        }
      }, 1000);
    }
  });



// WEBPACK FOOTER //
// ./public/javascripts/page_views.js

Robert, 

We're running SQL Server but using Tableau to run our queries and analysis. I ran some other filters on the request table in the query below and Am now getting a request total of 67,379 which is much closer to the total I pulled from the API call (71,030). I did pull out all the API calls to that course, but I think that might be problematic in that now I'm filtering out requests made by the mobile apps, no? I know that our DBA has also done some additional filtering on our data based on the query you shared with him. 

SELECT
    requests.url AS 'URL'

FROM requests

WHERE
    requests.url LIKE '/courses/75099/%'
    AND requests.url NOT LIKE '%record_answer%'
    AND requests.url NOT LIKE '%backup%'
    AND requests.url NOT LIKE '/api/%'

Audra,

That's part of my dilemma in pushing any of these queries to an actual solution. Without a full understanding of some of the requests, it spoils the water on any result. Like you said, most mobile traffic would be API calls, but so are some of the desktop requests.

Maybe we should try combining API calls with User Agent and ignore them from Desktop Browsers?

Here's a query I thought might help with understanding user roles viewing content areas of a course.

-- course-user-role-access-pivot-community.sql
-- course page requests, content area by user, enrollment type
-- regular joins
DECLARE @course_id BIGINT = 100000001234567;

SELECT
     user_id,
     enrollment_type,
     ISNULL(analytics,0) analytics,
     ISNULL(announcements,0) announcements,
     ISNULL(assignments,0) assignments,
     ISNULL(discussions,0) discussions,
     ISNULL(enroll_users,0) enroll_users,
     ISNULL(external_content,0) external_content,
     ISNULL(external_tools,0) external_tools,
     ISNULL(gradebook,0) gradebook,
     ISNULL(grades,0) grades,
     ISNULL(homepage,0) homepage,
     ISNULL(modules,0) modules,
     ISNULL(pages,0) pages,
     ISNULL(quizzes,0) quizzes,
     ISNULL(rubrics,0) rubrics,
     ISNULL(settings,0) settings,
     ISNULL(users,0) users
    
FROM (
     SELECT user_id, enrollment_type, content_area, count(content_area) hits FROM (
         
          SELECT user_id, enrollment_type, content_area FROM (
               SELECT
                    requests.user_id,
                    --uet.enrollment_type,
                    ed.type AS enrollment_type,
                    requests.timestamp_day,
                    requests.session_id,
                    CASE
                         WHEN PATINDEX('/courses/%/analytics', url) >= 1 THEN 'analytics'
                         WHEN PATINDEX('/courses/%/announcements%', url) >= 1 THEN 'announcements'
                         WHEN PATINDEX('/courses/%/assignments%', url) >= 1 THEN 'assignments'
                         WHEN PATINDEX('/courses/%/conversations/%', url) >= 1 THEN 'conversations'
                         WHEN PATINDEX('/courses/%/discussion_topics%', url) >= 1 THEN 'discussions'
                         WHEN PATINDEX('/courses/%/enroll_users', url) >= 1 THEN 'enroll_users'
                         WHEN PATINDEX('/courses/%/external_content/%', url) >= 1 THEN 'external_tools_content'
                         WHEN PATINDEX('/courses/%/external_tools/%', url) >= 1 THEN 'external_tools_content'
                         WHEN PATINDEX('/courses/%/grades%', url) >= 1 THEN 'grades'
                         WHEN PATINDEX('/courses/%/gradebook%', url) >= 1 THEN 'gradebook'
                         WHEN PATINDEX('/courses/%/modules%', url) >= 1 THEN 'modules'
                         WHEN PATINDEX('/courses/%/pages%', url) >= 1 THEN 'pages'
                         WHEN PATINDEX('/courses/%/quizzes/%', url) >= 1 THEN 'quizzes'
                         WHEN PATINDEX('/courses/%/rubric_associations%', url) >= 1 THEN 'rubrics'
                         WHEN PATINDEX('/courses/%/settings%', url) >= 1 THEN 'settings'
                         WHEN PATINDEX('/courses/%/users%', url) >= 1 THEN 'users'
                         ELSE 'homepage'
                    END AS 'content_area'
              
               FROM CanvasLMS.dbo.requests
               --JOIN CanvasLMS.dbo.user_enrollment_type_vw uet ON requests.user_id = uet.user_id
               INNER JOIN CanvasLMS.dbo.course_section_dim csd ON requests.course_id = csd.course_id
               INNER JOIN CanvasLMS.dbo.enrollment_dim ed ON ed.course_section_id = csd.id AND requests.user_id = ed.user_id
               WHERE requests.course_id = @course_id
                    AND requests.user_id IS NOT NULL
                    --AND enrollment_type = 'student'
                    --AND ed.type = 'StudentEnrollment'
                    AND PATINDEX('/api/v1/%',url) = 0
                    AND web_application_controller NOT IN ('files','folders')
                    AND web_application_action NOT IN ('backup')
          ) y
          GROUP BY user_id, enrollment_type, timestamp_day, session_id, content_area
     )x
     GROUP BY user_id, enrollment_type, content_area

) s
PIVOT
(
    SUM(hits)
    FOR content_area IN (
               analytics,
               announcements,
               assignments,
               discussions,
               enroll_users,
               external_content,
               external_tools,
               gradebook,
               grades,
               homepage,
               modules,
               pages,
               quizzes,
               rubrics,
               settings,
               users
          )
) AS pvt
GO‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Sample Result

user_id

enrollment_type

analytics

announcements

assignments

discussions

enroll_users

external_content

external_tools

gradebook

grades

homepage

modules

pages

quizzes

rubrics

settings

users

1

observer

0

7

36

15

0

0

0

0

43

0

1

0

15

0

0

0

2

observer

0

3

23

4

0

0

0

0

39

31

0

0

7

0

0

0

3

observer

0

0

0

0

0

0

0

0

6

7

0

0

0

0

0

0

4

observer

0

0

2

0

0

0

0

0

2

0

0

0

1

0

0

0

5

observer

0

1

5

1

0

0

0

0

5

0

0

0

3

0

0

0

6

student

0

1

1

1

0

0

0

0

1

2

1

1

1

0

0

0

7

student

0

12

25

19

0

0

0

0

30

41

56

0

55

0

0

0

8

student

0

1

69

13

0

0

0

0

55

40

21

0

53

0

0

0

9

student

0

1

43

6

0

0

0

0

44

41

19

0

39

0

0

0

10

student

0

3

52

4

0

0

0

0

10

18

51

0

58

0

0

2

11

student

0

1

0

1

0

0

0

0

0

4

3

0

0

0

0

0

12

student

0

0

1

0

0

0

0

0

0

1

1

0

1

0

0

0

13

student

0

2

68

6

0

0

0

0

66

4

27

0

47

0

0

1

14

student

0

12

42

15

0

0

0

0

21

20

48

1

42

0

0

0

15

student

0

2

29

7

0

0

0

0

2

65

82

0

63

0

0

0

16

student

0

10

95

27

0

0

0

104

31

134

52

7

37

26

7

19

17

student

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

18

student

0

19

38

24

0

0

0

0

47

33

39

0

36

0

0

0

19

student

0

1

38

6

0

0

0

0

41

41

18

0

36

0

0

0

20

student

0

4

8

4

0

0

0

0

10

10

6

3

5

0

0

1

21

teacher

0

10

95

27

0

0

0

104

31

134

52

7

37

26

7

19

Thanks for this great discussion. Robert Carroll I have a question about your thought, "What I don't know, is how Canvas calculates the view_count in Canvas Data wiki_page_fact. It seems likely it's a transactional database count when the server returns the page to a user."

Might that view_count include page views in public courses, where users do not have Canvas accounts? I wonder if you've learned any more in the last 2 years. Since there is no user information in the table about who did the viewing, I am hoping there is some usage data being stored for public courses.

I don't know how it calculates view_count, but I do see view_counts in wiki_page_fact for public only courses.

Using the query below in #comment-101576, for the sampled course, I get page views from Requests for all roles.

#comment-101576 

Thanks much Robert, that is hopeful.  I am going to do some testing.