Community help

burkepk · ‎03-29-2024

Hello all, our team is new to developing solutions to pull data from Canvas. I have been lurking in the forums for a bit, trying to gather as much information I can to set up the correct tools that we would need to be successful. However the one part I am wondering about is where to run the Python functions to handle the initialization and synchronization calls. We want to stay away from EC2 because of the administrative overhead involved, but also we are apprehensive about the 15 minute time limit on Lambda. Has anyone containerized their code to run on Fargate? Or is the 15 minute time limit in Lambda enough for processing the synchronizations and we run a one off process just for the initializations?

ColinMurtaugh · ‎03-30-2024

Hi --

We've had success running our CD2 init/sync code in Lambda and orchestrating the process using Step Functions. Currently we're syncing everything in the canvas schema every three hours, and the process has been running for a couple of months without problems. A few of the tables (less than 5, IIRC) are large enough that the init step took longer than 15 minutes -- for those we just ran a one-off init outside of Lambda, and subsequent syncs have been fine.

Here's a link to a work-in-progress version of our pipeline code. This is essentially a slightly simplified version of the process that we run ourselves; I have a little work to do to apply some our recent updates to the public version of the code, but you can get a sense of how it works:

https://github.com/Harvard-University-iCommons/canvas-data-2-aws/tree/develop

--Colin

View solution in original post

ColinMurtaugh · ‎03-30-2024

Hi --

We've had success running our CD2 init/sync code in Lambda and orchestrating the process using Step Functions. Currently we're syncing everything in the canvas schema every three hours, and the process has been running for a couple of months without problems. A few of the tables (less than 5, IIRC) are large enough that the init step took longer than 15 minutes -- for those we just ran a one-off init outside of Lambda, and subsequent syncs have been fine.

Here's a link to a work-in-progress version of our pipeline code. This is essentially a slightly simplified version of the process that we run ourselves; I have a little work to do to apply some our recent updates to the public version of the code, but you can get a sense of how it works:

https://github.com/Harvard-University-iCommons/canvas-data-2-aws/tree/develop

--Colin

burkepk · ‎04-01-2024

Thank you very much for your insight.

Setting up DAP Synchronization in AWS, Fargate or Labmda?

AWS

DAP client library

python

With dap cli, lightweight method to determine if d...

Analytics API / Metrics calculation logic

CD2: Enhanced Rubrics

DAP initdb error - aiohttp.client_exceptions.Clien...

CD2: courses table not updating course changes, bu...

With dap cli, lightweight method to determine if d...

tab_configuration data missing from some records i...

Submissions Incremental Query job run time

Descriptions/codes for numbers in the web_logs.web...

Answer_ID Is Empty For Calculated Question Answere...

You're signed out

Setting up DAP Synchronization in AWS, Fargate or Labmda?

Community help

View our top guides and resources: