Canvas activity and web_logs retention

Community Explorer


I'm curious how others are using and managing the Canvas web_logs data and activity reporting.

There's a lot of, often vague but genuine, interest in this data for things like,

  • Querying course activity for things like administrative drops, refund processing, or someone claiming to have done something--like submitting and assignment--but no such artifact exists in Canvas.
  • Monitoring service health and security, so picking out spikes in errors or usual activity.
  • Usage and engagement insights.

State policy is that we need to maintain academic data for students for 2 years. While we're hopeful this can be met with data and reports within Canvas instead, when we noticed the DAP API only keeps web_logs going back 30 days we realized that we may need to start pulling and archiving this data sooner rather than later.

But we're wary of keeping this data around for longer than is necessary.

  • It's large. Our Canvas Data 1 requests archive of is some 170G compressed (750G uncompressed) text and we're not even a year into using Canvas.
  • It's difficult to work with. Yes, we can see that a request was made, but if you don't understand the context of what that request means within Canvas it's not actually very useful.
  • It doesn't always mean what you think it does, or want it to. You can't measure engagement by requests alone because some activities make more request than others, and some people just prefer (or need) to keep their work outside of the web browser and only use the LMS for the things they have to.

So, we're just curious how others use and manage this data. How long do you keep it? What have you done that's worked well? What other resources have you found for doing similar things?