[ARCHIVED] Using Canvas Data CSVs to generate rudimentary stats?

tim_odonovan
Community Participant

Hi all,

Due to the Covid-19 outbreak, our university is now fully online.  We have usage data from the Admin>Analytics section to get page views and participations, which gives some high level data, but our senior management team are looking for further metrics on our various platforms to understand what level of activity is being undertaken. When they can't see students on campus, data is seemingly more important now!

 

We're reasonably new adopters of canvas, and have not yet managed to use the canvas data portal properly, but I did set up canvasDataCli to at least get the files.  Unfortunately, with everything else going on, we have no capacity at the moment to set up any data warehouse integration either,  so we're stuck with the raw csv files.

I'm pretty proficient at data manipulation on the linux command line with things such as grep/sed/cut etc, so I think I have been able to get things such

* the number of conference started each day by using the 9th field in the conferences dim file (conference started at)

gzcat conference_dim-00000-XXXXX  | cut  -f9 | sort | grep -v "N" | cut -d" " -f1 | uniq -c

* the number of unique users using canvas each day by grepping each date against the more recently download requests files, and looking for unique occurrences of the 6th field (foreign key from users_dim), such as:

zgrep 2020-03-14 requests-00000-XXXX* | cut -f6 | sort | uniq | wc -l

I know this is not very efficient, but perhaps the quickest way for me to now get some rudimentary stats. The numbers I'm getting out of this seem to be in line with what I'd expect, but before I start publishing numbers,  is this a valid approach?  

Thanks,

Tim

Labels (2)