Community help

sor1 · ‎12-04-2023

I recommend that the planned change to remove the user_agent_id column from web_logs be cancelled for a later time. Instead, just provide null contents for the column.

Changing table structure affects those of us who maintain ongoing replicas of CD2 tables and breaks the attendant processing, especially if .csv format is used. Use of parquet or json format reduces some of the immediate trauma, however, at some point, the reconstituted version of the table needs to be dropped and recreated.

More importantly, the web_logs table is a special case. Some of us had been using the requests table for time series analysis and have converted to using web_logs by adding converted requests table records to the 30 day web_logs snapshot. Approximately twenty steps are needed in my environment to add the additional columns and recreate the history. The 12/16/2023 change will probably need as many. Just remember that after the new snapshot is taken, we need to drop and recreate the incremental tables and change the procedures associated with the application of the incremental records to the original snapshot.

Note that the elimination of the user_agents table would not be a significant issue.

marco_divittori · ‎12-10-2024

It looks like the user_agent_id column was finally dropped in Nov 2024 instead of on 2023-12-16 as indicated in the release notes. This has caused a significant amount of turmoil for our data processing procedures which has taken us weeks to untangle and we're still not done. It would have been a non issue if, instead, a null value was provided as @sor1 suggested. Has anyone else been negatively impacted by this change?

sgergely · ‎12-12-2024

Hey, thanks for reporting your issue. I would like to improve our data schema rollout process so can you please help me understand some things?
We hadn't been sending incremental data for a while for user_agents table. Why haven't you stopped syncing it when the announcement was made?
user_agents table was just dropped, so no change was made in a table schema, like adding or removing a new column which could make challenges in a huge database. What is the turmoil around that, can you please describe it in more details so I can understand it better?

marco_divittori · ‎12-12-2024

@sgergely I'm referring to the removal of the web_logs.user_agent_id column referenced in the release notes I linked above, not the user_agents table. We store web_logs data in S3 so we had to invest a significant amount of time to identify when the schema change occurred (since it was scheduled for Dec 2023 and then arrived unexpectedly) and adjust our data files. We couldn't simply re-initialize the table without losing our web_logs history due to the 30-day retention policy.

sor1 · ‎12-13-2024

I just wanted to share some experience and thoughts. Note that our environment is Snowflake:

Our processing uses json (parquet would be just as good) to partially insulate us from column changes. We also maintain the web_logs records since CD1. (Needed to convert Requests to web_logs) - this gives us history since 2020. We are also considering maintaining a history of incremental update records of other tables to permit us to approximate the state of Canvas Data at times in the past. For processing performance, we refresh the table snapshots every term.

web_logs - Please cancel planned change to remove user_agent_id on 2023-12-16

EDUCAUSE Insights: Data and Decision-Making

Problems with GraphQL in User Pageviews reports

Metrics Easy Button

Analytics API / Metrics calculation logic

CD2: Enhanced Rubrics

Inconsistencies in user submission attempts (new q...

Seeking advice on CD2, ETL and presentation proces...

Sample Data

Canvas Data Access Platform (DAP) Python Client Li...

Seeking Advice: Integrating CD2 Data for Student &...

You're signed out

web_logs - Please cancel planned change to remove user_agent_id on 2023-12-16

Community help

View our top guides and resources: