Hi pgo586
I can't answer your question just yet. So I'm going to start and get back into it later or let others jump in.
So far...
I would say that the events emitted for file attachments in #asset_accessed can be expected to be similar between formats. Whether events represent all file downloads of attachments is unknown at the moment.
Caliper and Alpha(Raw) formatted events are mostly the same where the event_type is the same message. The difference occurs in the limited scope of Caliper fields that can be sent with the message vs the comparable fields we'd expect to see in Canvas Data, the API, or even the Canvas UI.
With RE to asset_accessed events, the 2 formats should be nearly the same when dealing with the differences stated above. Below is how I validate these assumptions by collecting both formats. Live Events are currently emitted by the code, not emitted by the data layer, which means that some messages may fail to get sent to your queue. So there is a margin of error. It would by nice if file_fact had a count, but I'm not seeing it.
Here is the query to collect a count for a file resource by id in both formats. You can see that the Caliper format does not offer a filename. You can also see that between formats, for the same file id, the counts are almost the same, if not off by a 1... and sometimes more. The counts could be off because the messages weren't sent, or they weren't ingested correctly.

The numbers in both LE formats are comparable, but when I tried to compare and look at the Requests table I got lost in a few places.
- It appears there are more rows in the requests table for web_application_controller = 'files', than are represented in asset_accessed.
- My counts for LE vs CD.requests yielded considerably more rows for LE than Requests, which seemed odd. So I started hand picking and trying to identify what was different.
In LE Caliper format I found that many events have actor_type SoftwareApplication vs Person. This field doesn't exist in Alpha format. However, picking Caliper#actor_user_login is present when field is Person and Alpha#user_login_meta is comparable. Using these fields in a WHERE clause I get the same row counts for each day. I then continued to try to keep identifying the differences between LE and CD.requests and I think I need to pull this into Tableau and get a better visual of the fields and values.

I'm currently sitting on a LEFT JOIN on LE.request_id = CD.requests.id with WHERE requests.ID IS or is NOT NULL... Quite a number of events don't appear to be present in requests, some requests don't have events.
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.