I'm interested in confirming some assumptions about how file views and downloads (for individual files) are represented -or not- in Live Events (both formats), specifically:
1- It appears to me that in both cases, there is no information whatsoever about file views (even though in theory there *could* potentially be (for example, in the Caliper formatted events, one could use the 'NavigatedTo' event to represent such file view data).
2- While it is clear to me that file downloads *are* represented in the Raw Live Event format (see table in asset_accessed event : type, subtype and triggers suggesting this), it is not clear whether the Caliper formatted events (with an action type of 'NavigatedTo', and an asset_type of 'attachment') represent the same information or not. (there's no documentation I could find on this, but I theorize that this is likely the case).
And finally a question:
Is it safe to assume that the totality of file downloads is in fact represented by live events in each of the formats?
(in other words, would it be safe to depend on live events ONLY to find all such downloads?)
I can't answer your question just yet. So I'm going to start and get back into it later or let others jump in.
I would say that the events emitted for file attachments in #asset_accessed can be expected to be similar between formats. Whether events represent all file downloads of attachments is unknown at the moment.
Caliper and Alpha(Raw) formatted events are mostly the same where the event_type is the same message. The difference occurs in the limited scope of Caliper fields that can be sent with the message vs the comparable fields we'd expect to see in Canvas Data, the API, or even the Canvas UI.
With RE to asset_accessed events, the 2 formats should be nearly the same when dealing with the differences stated above. Below is how I validate these assumptions by collecting both formats. Live Events are currently emitted by the code, not emitted by the data layer, which means that some messages may fail to get sent to your queue. So there is a margin of error. It would by nice if file_fact had a count, but I'm not seeing it.
Here is the query to collect a count for a file resource by id in both formats. You can see that the Caliper format does not offer a filename. You can also see that between formats, for the same file id, the counts are almost the same, if not off by a 1... and sometimes more. The counts could be off because the messages weren't sent, or they weren't ingested correctly.
The numbers in both LE formats are comparable, but when I tried to compare and look at the Requests table I got lost in a few places.
In LE Caliper format I found that many events have actor_type SoftwareApplication vs Person. This field doesn't exist in Alpha format. However, picking Caliper#actor_user_login is present when field is Person and Alpha#user_login_meta is comparable. Using these fields in a WHERE clause I get the same row counts for each day. I then continued to try to keep identifying the differences between LE and CD.requests and I think I need to pull this into Tableau and get a better visual of the fields and values.
I'm currently sitting on a LEFT JOIN on LE.request_id = CD.requests.id with WHERE requests.ID IS or is NOT NULL... Quite a number of events don't appear to be present in requests, some requests don't have events.
Thanks Robert for doing all that work! It didn't really occur to me that I could try to run equivalent queries (I don't have access to Redshift though, but I could try to run queries if needed via an AWS Athena instance). On the other hand, I do have access to both types of live events via Splunk right now, and I could certainly run some searches that would compare the results given by the two different formats to see if they display the same coverage ( I don't have access to the requests table within my Splunk data though). But yes, it's more work to figure out whether the totality of the download data shows up in live events, as you mentioned. As you said, there could always potentially be some events missing in there.
One quick comment regarding the Caliper format and filenames: turns out that, as long as we're limiting ourselves to the NavigatedTo type of action, and with an asset_type of 'attachment', the format *does* offer a filename, although probably in a different place than expected. I see it in the object.extensions.com.instructure.canvas.filename (there's also asset_name and display_name at the same level within the object.extensions, which also appear to carry the same value as filename).
It would be great to hear from somebody from Instructure too, as they might have some information handy to answer these questions (and hopefully with less work!).
You have all the tools! I have a Splunk T-shirt I got from re:Invent. I wear it on days when I try to figure out how do crazy things that Splunk would make easier. On the front it says "Because the ninjas are too busy." Alas, I have no funds for such toys. Maybe I can pass some questions your way when I get over my head?
Thanks for the comment about filename and display_name, it appears I need to update my code for Caliper to detect missing fields. I watch the source code for Canvas hoping to find those changes but was not aware they had added new points. Documentation around this service needs an upgrade. I'm hopeful that's around the corner.
I know the goal is to have Live Events kind of be the replacement of the Requests table for things like user interaction. With the ability to separate user events from system events and divide the task it makes much more sense for Live Events to answer questions about user activity and access. I'm not sure if thumbnails and images in the body of a page are considered assets or if it's just downloaded attachments. The Requests table shows the web server access to files, which shows images served. I'm just not sure if #AssetAccessed includes all the interactions yet or if there are limits to what it's collecting.
Let's ping oxana
Sure Robert, feel free to send your Splunk questions over and I'll give them a shot (at least I'll try to do so). BTW, thanks for mentioning the issue with thumbnails and images, which I hadn't thought about. One possibility might be to limit oneself to collecting such events restricted to some filename extensions and not other, perhaps? (which may work well as long as as instructors don't upload a bunch of plain images as files in their sites though :-(). I will check to see if there's something that distinguishes those events (with respect to other attributes) from the other types of file download events. BTW, running the comparison searches on Splunk which count the number of live events in both formats, I get the same results as you do, meaning that they seem to be off by at most 1 (sometimes equal, sometimes *almost* equal). I agree that pinging Oxana is a great idea. Thanks again!
pgo586 and @carrollrw we recently released our event documentation to Data Services Beta instance, it hasn't been announced yet in release notes ( it was warmfixed to make sure we get it out to production by December Canvas release timeline) . Take a look at the Canvas and Caliper 1.1 event payloads schema to see how the file download events are represented.
pgo586 we emit an event for every user triggered file download action taken in Canvas via browser or mobile app , that said we don't provide any SLO when it comes to delivering every single event. I hope this answers your question.