cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
hong_chau
Community Participant

How are the SubAccount Analytics calculated?

Jump to solution

I've asked our Business Intelligence team to total the files uploaded, discussion topics created, assignments created, and media recordings for sub-accounts to recreate our own version of Subaccount Analytics. Our team is able to summarize the first three, but we aren't sure how "media recordings" are calculated.

>> Is there data for "Media Recordings" in the portal? If not, how is this number aggregated?

To get to a subaccount analytics: yourinstitution.instructure.com/accounts/[somenumber]/analytics

Screen Shot 2016-06-30 at 10.38.42 AM.png

1 Solution

Accepted Solutions
ccoan
Instructure
Instructure

Hello Hong,

Media Objects are indeed shown inside of Canvas Data. However maybe not in the places you expect them. For example you can look in submission_dim at the submission_type field. There are a couple other places they are used as well, such as in the conversation_dim. As for how it's calculated in the UI. It actually looks at an internal field called "MediaObject".. That number is a calculation of all media objects that are active, and are attached to a course in the subaccount you're looking at.

Since we don't export MediaObjects. I did some tests, and it appears there are a couple contexts where you can't get an exact count, e.g. theres at least one media here. I've opened an internal request to add MediaObjects to the export so you can at least grab the same count, and maybe get a few more pieces of info on context for pieces of media.

View solution in original post

8 Replies
ccoan
Instructure
Instructure

Hello Hong,

Media Objects are indeed shown inside of Canvas Data. However maybe not in the places you expect them. For example you can look in submission_dim at the submission_type field. There are a couple other places they are used as well, such as in the conversation_dim. As for how it's calculated in the UI. It actually looks at an internal field called "MediaObject".. That number is a calculation of all media objects that are active, and are attached to a course in the subaccount you're looking at.

Since we don't export MediaObjects. I did some tests, and it appears there are a couple contexts where you can't get an exact count, e.g. theres at least one media here. I've opened an internal request to add MediaObjects to the export so you can at least grab the same count, and maybe get a few more pieces of info on context for pieces of media.

View solution in original post

hong_chau
Community Participant

Deactivated user, thanks for the explanation!

Could you further extrapolate how the other analytics are calculated?

For example, our reports do not return the same number of items for any of the Courses, Assignments, Discussion Topics, and Files Uploaded when we roll up our aggregates.

Is there a way to download the raw files that calculate these numbers? Or, can you point to the specific logic if we are using the Canvas Data Portal files?

Thanks,

Hong

brenden_goetz
Community Participant

I'll second this question! Any way to see the logic that is used to calculate specific metrics (e.g. assignments per course) within the analytics in Canvas would be great so we can understand why or own numbers do or do not match up. Thanks!

Hello Hong (and  @brenden_goetz ​),

Sorry it took me so long to reply to this! Ramping up for fall start is always full of fun for us here at Instructure. I'd be happy to help answer some questions, and provide more insight where I can.

The first thing I think I'll provide is the analytics github repo by instructure which can be located: HERE. Now most of this is pretty gnarly ruby code, (but!) for those of you ruby readers out there hopefully it can shed some insight.

The thing I would think off the top of my head that could be throwing off your analytics is the fact we scope to term based items. I.E. the default term, etc based on whichever term you're looking at in Account Analytics. The important thing to remember here is you can't just look at the term dates at the account settings becauses courses can override on an individual level with the option in the course settings to override the term dates. My first bet would be to take a look there.

However I'll do some digging over the weekend, and try to get some queries, and see if anything jumps out at me. I'm not sure how willing you'd be to share your queries (maybe even through a support case referencing this thread?), but in the meantime I'll try to dig through, and get some matching queries making notes of gotchas/making sure there's nothing wrong.

Thanks,

Eric

hong_chau
Community Participant

Thanks Deactivated user​! I am not a Ruby person, but I'll send the Github link to our business intelligence folks and see what they say! In some cases, we're able to get our numbers relatively close (within 50 or so items), but for files (for example), we end up at least 400 off! Any insight would certainly be welcomed!


Best of luck with ramp up!

brenden_goetz
Community Participant

Deactivated user​ - thanks for the reply. I'll dig through the Ruby code on GitHub and see what I can figure out.

In the meantime, here are a couple queries and their corresponding results, as well as the results that are displayed on the Account Analytics page in case that helps us troubleshoot.

-- attempt to count courses for spring 2016

-- result is 2,412 courses; Analytics page shows 2,413... so this is really close
select count(id)

from course_dim

where enrollment_term_id = '10430000000000078'
  and workflow_state = 'available';

-- attempt to count files uploaded spring 2016

-- result is 224,084; Analytics page shows 175,939... so this is way off

select count(f.id)

from file_dim f

join course_dim c
  on f.course_id = c.id
  and c.enrollment_term_id = '10430000000000078'
where f.file_state ~ 'hidden'
   or f.file_state ~ 'available';

Hey  @brenden_goetz ,

I think I've figured out the discrepancies in your queries, and it's definitely I think it's sort of an edge case in analytics. One that you don't necessarily think about, but makes sense. The key difference here is analytics will only count UNIQUE files, it determines a file is unique based on the files MD5 (a hashing algorithm used to identify when files have changed or not). However a singular file can have multiple file objects in the Canvas DB (that get exported by Canvas Data). For example you can take a word document, and upload it to Course A. If this is the first time it's been uploaded it will have a unique md5, and be added to analytics as well as have a file object created. Now you take the same file, and upload it to Course B. Since the file is the same the counter does not go up in analytics, however a new file object is made that will be exported by canvas data. This makes controlling permissions (plus the general management of canvas easier). As such we'll have to take this into account when writing SQL Queries.

As such I've remade your files query to return a result that seems correct (it looks like this is the current term. If I got something wrong please let me know (it may scope by status)!):

SELECT COUNT(DISTINCT f.md5)

  from file_dim f

  join course_dim c

  on f.course_id = c.id

  and c.enrollment_term_id = '10430000000000078';

As for the courses query that's really close. I feel like that may just be a data update problem i.e. it hadn't yet pulled over but your query should indeed match what's inside the analytics page.

- Eric

brenden_goetz
Community Participant

Thanks, Deactivated user​! Very well explained.