Frequency of push of Live Events to AWS?

Jump to solution
MatthewBernacki
Community Member

Hi!

I'm working on a project where I'm sending Live Events (asset_accessed) to an AWS S3 bucket, and am sending those on to an endpoint where I work on them. I'd like to know how long it takes to conduct an event when an asset is accessed in Canvas LMS site to when my event will land at my end point. I can limit down my refresh rate to pull from S3 and know that term. My question:

How frequently does Canvas push Live Events to AWS? and is this adjustable?

(or alternately is this a pull process from AWS back to Canvas, and can I trim it down)   

 

Labels (1)
0 Likes
2 Solutions
James
Community Champion

@MatthewBernacki 

Live events are sent to the AWS SQS Queue as they happen. This is not adjustable.

When they become available to you in the SQS Queue on AWS is a different issue and that is controlled by AWS rather than by Canvas. See How Amazon SQS works for more information.

In short, Canvas may have sent the message and AWS received it but it's still not immediately available to you. Yes, I know that sounds confusing. The Amazon SQS short and long polling page attempts to explain. In short polling, you might not get any responses the first time (making you think you have cleared out the queue) but subsequent requests will get your messages when there are less than 1000 of them. I opted for long polling since it is supposed to save money and I am paying for this out of pocket rather than charging the school for it (it's usually less than $2 a month for what I'm using it for, which includes the asset_accessed event).

You can poll as frequently (or infrequently) as you like, but there are retention limits that are configurable on AWS, so you will want to make sure that you continue to poll regularly or the queue items will get purged. You get billed for making requests, so when I set up my polling, I set it up to poll every 5 minutes (we're a small institution) but then attempt to clear out the queue. During slow times, I make calls with no results. During busy times, things get delayed more than 5 minutes. The batch methods only allow you to retrieve 10 items at a time, so you cannot tell AWS to "give me everything you have," you need to retrieve and then delete. Right now, I have no clear purpose for gathering the information other than I use it for a project in my stats class, so I merely duplicate the information in a local database and then clear it from the SQS queue.

Live Events are not guaranteed, though. Canvas makes a "best effort" to deliver them. Have a backup plan for getting the data (if possible).

View solution in original post

robotcars
Community Champion

Hi @MatthewBernacki

Live Events should be sent from Canvas in Real Time.

Live Events sent from Canvas to an AWS SQS queue in my experience arrive with about 1-3 second latency. That was my experience using my own SQS to SQL tool, and recently testing using SQS > Lambda/SQS Trigger > Kinesis > S3. However, this method does pool events, say 1000 at a time before it writes a file to S3, which can add latency to your process if you're looking for real time. So for asset_accessed, you should technically be able to browse Canvas, and watch S3 continuously update with new objects with that traffic with only a slight delay.

I do not believe it is adjustable, and much depends on how you consume the SQS queue. If you consume the events from the queue in batch jobs, say every hour, the events are real time but your processing is not. Also if you're querying the data with something like Athena you may have to handle a Catalog update for the table with a Glue job or Crawler, which if scheduled is still not real time either, and your data will be batched at interval. I have not tested Glue's Streaming ETL jobs, as it seemed cumbersome if all 87 Live Events are desired vs 1, same with 87 Kinesis Streams. With 1 streaming event it might be easier to manage.

It's that delay that has me working on consuming Live Events into an application database first, to be used in real-time, then move those events to S3 for storage when they are stale (way past real time), for long term or historical analysis and reporting.

 

View solution in original post