Community help

phanley · ‎09-18-2025

The issue -- we currently pull reports from the api/v1/reports several times a day, and the person who is responsible for those jobs that has complained to me on multiple occasions that they take a long time and often get stuck or fail, because of the size of the reports.

It looks like all of the data in those reports is available via canvas data, so I thought I would write the needed queries and move to updating multiple times a day (I currently sync once per day) -- but my workflow triggers a number of Kubernetes pods (1 per table plus the job that fetches the table list, so 91) which seems wasteful if the data hasn't been updated yet.

Is there a query-able data source accessible by either the dap cli or api that is similar to the meta table syncdb uses? Basically a I'm looking for a way to get a timestamp for the last completed update from our production instance to that's available to use as a trigger for a workflow

I had considered implementing a manual task that attempts an incremental snapshot from the last retrieved on one of the bigger tables, and then triggering the entire workflow if it returned more than 0 records, but I'm not sure if all the tables are updated at the same time or not.

Pete5484 · ‎09-22-2025

I think this thread may be useful Canvas Data 2 - Incremental "until" to maintain referential integrity

Since it's streamed for 'eventual consistency' you're not going to be able to make any inference that an update to one table means another has been also been updated (ie it's not a read replica). Except that data >=4hrs old should be in the tables.

sgergely · ‎09-23-2025

Great question!

The data behind CD2 is updated every 4 hours if there is any data that can be added. Since every Canvas instance usage is different, and features are used differently there is no easy way to provide a solution for you.

What I would do is that I would check incrementally only those tables that are important for my data needs.
The other idea I would also check is historically which table gives me new data every 4 hours and which is not, so then I would know which tables I should check frequently - every 4 hours - and which needs only 1/day incremental update check.

With dap cli, lightweight method to determine if data has been updated since last syncdb

cd2

cd2 dap

DAP client library

Metrics Easy Button

Analytics API / Metrics calculation logic

CD2: Enhanced Rubrics

DAP initdb error - aiohttp.client_exceptions.Clien...

CD2: courses table not updating course changes, bu...

Inconsistencies in user submission attempts (new q...

Seeking advice on CD2, ETL and presentation proces...

Sample Data

Canvas Data Access Platform (DAP) Python Client Li...

Seeking Advice: Integrating CD2 Data for Student &...

You're signed out

With dap cli, lightweight method to determine if data has been updated since last syncdb

Community help

View our top guides and resources: