Found this content helpful? Log in or sign up to leave a like!

With dap cli, lightweight method to determine if data has been updated since last syncdb

phanley
Community Contributor

The issue -- we currently pull reports from the api/v1/reports several times a day, and the person who is responsible for those jobs that has complained to me on multiple occasions that they take a long time and often get stuck or fail, because of the size of the reports. 

It looks like all of the data in those reports is available via canvas data, so I thought I would write the needed queries and move to updating multiple times a day (I currently sync once per day) -- but my workflow triggers a number of Kubernetes pods (1 per table plus the job that fetches the table list, so 91) which seems wasteful if the data hasn't been updated yet.

Is there a query-able data source accessible by either the dap cli or api that is similar to the meta table syncdb uses? Basically a I'm looking for a way to get a timestamp for the last completed update from our production instance to that's available to use as a trigger for a workflow

I had considered implementing a manual task that attempts an incremental snapshot from the last retrieved on one of the bigger tables, and then triggering the entire workflow if it returned more than 0 records, but I'm not sure if all the tables are updated at the same time or not.

Labels (3)
0 Likes