It's not designed for what we're trying to use it for
- almost anyone, including the Canvas Data team
The Canvas Data Portal - Requests documentation states
Pageview requests. Disclaimer: The data in the requests table is a 'best effort' attempt, and is not guaranteed to be complete or wholly accurate. This data is meant to be used for rollups and analysis in the aggregate, _not_ in isolation for auditing, or other high-stakes analysis involving examining single users or small samples. As this data is generated from the Canvas logs files, not a transactional database, there are many places along the way data can be lost and/or duplicated (though uncommon). Additionally, given the size of this data, our processes are often done on monthly cycles for many parts of the requests tables, so as errors occur they can only be rectified monthly.
When we started looking into this data's use, we provided a similar disclaimer.
Then SHEBENE, would consistently add "This is a conversation starter, not a smoking gun."
The requests table is unlike any other table in Canvas Data. With the exception of Pageviews, all other tables record events that are triggered when a teacher or student saves something in Canvas, such as when a teacher creates a page or assignment, or when a student submits an assignment or takes a quiz.
Pageviews are different, a user might click a link on a website from the moment the page loads until they stop clicking. Each of these events is useful.
It might tell us
And much more
Except the requests table doesn't just contain clicks, it contains logs. Anyone who has ever seen web server logs, knows that any transaction or request over HTTP is logged in whatever detail the engineers decide suits their needs. To give you some scope, here is a list of HTTP response status codes - HTTP | MDN. You would expect any of these to be a line for each request made to the server. Along with the URL, timestamp, ip address, user agent, and more.
Some of it's useful, most of it's completely undocumented. I tried compiling a spreadsheet once, to catalog my best effort at understanding the various web_application_/controller/action/context_type. Most of them appear to be routes. Canvas is nice and links up the user, course if /courses, and some other useful information.
Part of the problem with the Requests table, is the beauty of the web, and the Canvas LMS REST API. The same API that allows Canvas Developers to integrate their institution and extend Canvas or create tools is the same API that Canvas itself is built on. This means that any requests to the server made by Canvas are also logged, not just clicks or transactions made by the user.
Here's the best way I can demonstrate. Open Canvas and go to the Dashboard.
Right click the page and choose Inspect or Inspect Element, to open the Developer Tools - for the Canvas User
Click on the Network Tab, then the button. Cleared? Good.
Now start moving your mouse over the interface, here's some targets.
Now, look at the Network tab again, most of your actions were clicks or even hovering.
Look closely, do you see the unread_count? This is not something you performed, this was Canvas checking for new messages in your inbox to update the flag in the navigation.
The Problem with the requests table, is what also makes Canvas a great LMS. "Born in the Cloud"
This, and LTI's. LTI's and more, are hosted outside of Canvas in the Cloud creating a lot of noise in the table, rows which contain requests not triggered by the user, or Canvas, that we might not need for these purposes.
I found this, because like many of you, we have full time ** students. One of my early questions was geared toward understanding if all our users were local, or if they roamed. Can we make instructors aware of when students are traveling? Can we be empathetic to timezone differences? To answer this, I used some of the many Geo Location API's on the web to collect the location data of the remote_ip's in the requests table. At first I was extremely impressed with how many students we had traveling. Then I counted... there were too many students.
Using the Pseudonym Dim - Canvas Data Portal,
which contains the user's last_login_ip and current_login_ip.
Here's an overlay of logins vs. requests in Tableau. Student Logins Noise
I generated this map to share at Hack Night. Before that, we generated a map for NVLA with just student logins.
As you can see, the physical location of a user is different from some of their requests. If you understand the Cloud, then you can also see that a traveling student start's triggering cloud services in the regions they travel. It's also possible that a student sitting at home using Canvas on their laptop, while also using their phone can have a mobile IP address from another state. Solved: IP address in another state? - Verizon Fios Community
Along with the web_application_* fields, URL path's like /api, /ping,/pageviews, and others make filtering out the massive amount of data that grows in the request table difficult. Let's say you want to try anyway, check out Requests Table and the discussion about how to host and handle the large table, filter or delete rows.
OK, Let's Try
Here's a scenario. A common question in the Community.
Daily User Activity in a Course
where course_id is not null and course_id = # user_id is not null and user_id = # grouping by course_id user_id timestamp the complete date time, helps narrow down sessions timestamp_day quickly group by day - redundant, but really happy they provide this session_id helpful for trying to separate windows of user activity, this helps reduce idle time from our collection remote_ip identifies the user on the internet/location, this can change throughout the day, it also helps separate sessions
If a user walks away from the screen while Canvas is open, Canvas will run /ping requests that keep things alive. We can use session_id and remote_ip in order to attempt to filter data for active sessions. If you don't filter, and remove inactive requests, you will likely end up with data that shows user activity for hours, all day, sometimes multiple days.
Breaking it up into sessions, with a stacked bar chart – minutes on the y-axisccsd/lti/palette/teacher
compared to a student with less activity
The plot line, showing average student activity for the course
Let's zoom out, to all students in the course
This at least measures all users equally. Whether it's fully accurate is questionable, and mobile?
Back to 'conversation starters, not smoking guns'.
Here's a query - How Do I Determine Time Spent on Site #comment-97617
Nevada Learning Academy at CCSD uses this data, along with course activity by hour and submission times to identify when students are active, to schedule Live Sessions, for the most popular time of day or weekday. A teacher with full time and part time students, can schedule sessions when the most students will be available, or do more and split days and hours to be available for different groups. Teachers can flex their time to make these accomodations.
From here, you can expand queries and join tables to do a decent number of user analytics.
Here's some examples, I will try to update, add, and curate.
Where does that bring us? We can keep trying to filter and define the data, helping make it more manageable, and a little more accurate for these purposes. But maybe there's another way?
I will share more in a future post, but this is relevant now
Which is an experimental beta feature from Canvas, which sends messages to AWS Simple Queue Service. The messages are events and transactions, which are consumable in real time. During Hack Night, a member of the Canvas Data Team stated the paraphrased quote at the top of this post, adding that Live Events is a better way of dealing with requests and events. What about both?
I also had an opportunity at Hack Night to discuss this issue with some Canvas Engineers. While I have some other use cases for this, which I will share at a later time, my only request was to add the IP address of the user to the login event. I have been after the IP of each login for about 2 years now. With the IP of the login, we can filter out or specifically collect just the requests of the user's computer* instead of the noise, getting us closer to user activity and clicks.
* You might ask, why not just use the last_login_ip and current_login_ip from the pseudonym table?
- Canvas Data compiles once a day, 1 row per user. If the user logs into 3 or more devices, something is lost.
I invite any questions, comments, or contributions below; adapt the queries, post results, maybe a visualization.
What questions does this table help answer?
CCSD Canvas Team
** I have tried adding 'o-n-l-i-n-e' to this sentence a dozen times. Jive keeps removing it!
Also getting removed before LMS. What gives Jive?
Just one note. Live events are no longer experimental and have been in production since Feb this year. Its worth being aware that the available live events listed in the Canvas Documentation only lists the extensions to the Caliper standard and all standard Caliper events are including navigation events
True. It is in production, but must request the feature from a CSM. The documentation still states it is experimental, and invite only. Live Events (experimental) - Canvas LMS REST API Documentation
Live Events is currently an invite-only, experimental feature.
Maybe that needs to be updated, maybe not...
In my experimentation with live events, reviewing the docs, caliper, and the community resources available don't answer all of the questions involved in using it. Additionally, I have an open request (since early June) for more documentation and a request to explain some inconsistencies in the event messages that engineers are still trying to figure out/explain. I see you're a solutions engineer, if you can help me get those answers, please check with @jsailor , she has my list, or DM me. I'd love to move forward. I'm trying to use Live Events to append to Canvas Data daily, so our LTI is real time.
This was further expressed when I went to review the documentation this week, and found that the fields for the logged_in event, have been reduced to 1, when there are more, I'm using them.
This is why I didn't provide a link to the documentation in the original post, and tried to emphasize that it's still experimental. I did this so people don't go crazy trying to figure it out. I am also planning another post about Live Events, and intend to include some code as a starter kit, to help let it loose. :smileygrin:
I'd appreciate anything you can share.
I've noticed the same thing and rasied the fact it still says experimental with the Docs team a little while ago. The API docs live with engineering and they are in the process of updating their entire doc process so it may be a little while before it gets corrected. That said when they finish it will make keeping the API doc up to date and accurate a lot easier. For ref Live events being in production was noted in the Canvas release notes https://community.canvaslms.com/docs/DOC-14130-canvas-production-release-notes-2018-02-17
To be honest what I know on the Live events front I have learned via similar experimentation, working mostly pre-sales support I sadly don't have direct engineering input but often advise customers and help find solutions for them during the sales process so needed to have a deeper knowledge of Live events. Documentation on Live event I hope will improve over time as more people use it, its one of the few areas there is a gap right now. I have great admiration for the Doc team who keep up with so much stuff esp with all the changes and new features in the last 6 months.
What I do know is one of the quickest and easiest ways to explore live event for me was to use Splunk as it has a connector built in for SQS so I was quickly able to grab the live events and start to look at them, sort and query them. It really it a very nice tool for log and event analysis. I hadn't seen any inconsistencies but then again I am working with a relatively small sample set working against a demo system. What were you seeing?
A little off topic, but I love the discussion, maybe we can branch these replies later?
I love the Doc Team too, and everything Canvas is always improving. :smileygrin:
I've installed and played with Splunk a little bit, but it's a little out of reach for K-12. I also have my hands full with only 2 developers, and sometimes a few short scripts is all I have time to make improvements. My Live Events code is 3 files, loading events into a database.
Here are two of my issues.
Submission Created Submission Updated submissionCreated.actorsubmissionCreated.objectsubmissionCreated.group submissionUpdated.actorsubmissionUpdated.objectsubmissionUpdated.groupFor submission updated events, it seems some of those messages are the Course Instructor updating a submission. These records often have the user_login id, a group which includes the course information, and membership showing me the user role. However, with a student submission update event, I get only the user's canvas id and the assignment and submission id. No course or role. I can deal with that programmatically, but the documentation suggests I should have all 3.The membership data which contains the role of the user and the course is missing in 25% of submission_created event messages, and 52% of submission_updated, based on 24 hours of polling those events.
Assignment Created/Updated Submission Created/Updated lti_assignment_id
The LTI assignment guid for the assignment
The LTI assignment guid of the submission's assignmentWhen looking at submission events that include LTI links, I don't see anything that shows theses values.
Maybe the LE docs just need to note these as optional in some events? Then I'll know to work around it.I also haven't had time to vet these issues since before InstructureCon. I really need to get my 3 files up so maybe the Community coders can help. I mostly think it's just a lot of information to consume, test, compare/docs and figure out. I have plenty of other things on my plate while I patiently wait for answers from busy engineers. I'd rather get a good product than immediate gratification. :smileygrin:
That is an interesting issue. It may be the source from where these events are being generated is related to the data being missing. 25% sounds like a fairly typical mobile usage number. I know that there is a lot of work being done by the mobile team to get the app requests into the CanvasData requests logs properly, I'd keep an eye on this to see if any work there changes those numbers. When I get some spare time (in the distant future probably) I'll have to experiment and see what I can find.
Also I spoke to a couple of people in engineering, the doc marked experimental is a leftover from beta testing period and this is the official doc now. https://community.canvaslms.com/docs/DOC-14162-what-additional-caliper-extensions-are-available-in-c... + the IMSGlobal Caliper doco.
It may be the source from where these events are being generated is related to the data being missing.
I had tried looking that up too, but didn't get very far. I think this is where it becomes JSON.
Thanks! What version of Splunk are you using?
I was using the Enterprise trial running locally on my laptop (7.1 . I think) but its not expired. I'm pretty sure the free version still has the AWS SQS connector so will see if I can get a license for that and give it a go.