The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December. Read our blog post for more info about this change.
Found this content helpful? Log in or sign up to leave a like!
Greetings,
I'm curious how others are using and managing the Canvas web_logs data and activity reporting.
There's a lot of, often vague but genuine, interest in this data for things like,
State policy is that we need to maintain academic data for students for 2 years. While we're hopeful this can be met with data and reports within Canvas instead, when we noticed the DAP API only keeps web_logs going back 30 days we realized that we may need to start pulling and archiving this data sooner rather than later.
But we're wary of keeping this data around for longer than is necessary.
So, we're just curious how others use and manage this data. How long do you keep it? What have you done that's worked well? What other resources have you found for doing similar things?
Hi @bliszewski -- We have also found that Canvas data is large, difficult to work with, and doesn't really provide much insight into actual student learning behaviors despite many people's fervent hope that it will do so (at least on an enterprise level). That said, we do push our web_logs data to a Redshift table (I think we have a little under three billion records so far for CD2 web_logs). Because we are pulling down the data to an S3 bucket first, we still have all of the downloaded files (CD2 and CD1), so in that sense we have retained all of the data. It's a lot of data, but S3 storage is not a huge expense in the grand scheme of things.
In terms of "learning analytics" using this data, because we have so much variety in terms of how instructors design their course spaces and the extent to which they use various Canvas tools, most of what we report on from web_logs might just be considered "system usage". For example, how many unique users (i.e., unique user_ids) showed activity (i.e., records in web_logs) each day over the course of a semester.
Even with non-web_logs tables, I think that, in many cases, the data tells us more about course requirements than anything else. For example, if an instructor requires their students to respond to at least three posts by other students in the discussion, you will likely see most students dutifully responding to three posts. In other words, the data will provide some insight into student compliance, but will likely not shed light on the kind of questions that educators really care about, such as how students interact with new ideas, co-construct knowledge, etc. (I'm speaking at a general level -- I do think the data become more useful when you are working with a specific instructor who is able to provide context and may have specific expectations of usage, e.g., after a course redesign.)
Also, I don't know anything about specific state policies, but I would hope that something like web_logs would not be considered academic data that needs to be maintained. (But I am not a lawyer or state official.)
Just my two cents.
Best,
Martyn
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in