Community Help

tim_odonovan · ‎03-17-2020

Hi all,

Due to the Covid-19 outbreak, our university is now fully online. We have usage data from the Admin>Analytics section to get page views and participations, which gives some high level data, but our senior management team are looking for further metrics on our various platforms to understand what level of activity is being undertaken. When they can't see students on campus, data is seemingly more important now!

We're reasonably new adopters of canvas, and have not yet managed to use the canvas data portal properly, but I did set up canvasDataCli to at least get the files. Unfortunately, with everything else going on, we have no capacity at the moment to set up any data warehouse integration either, so we're stuck with the raw csv files.

I'm pretty proficient at data manipulation on the linux command line with things such as grep/sed/cut etc, so I think I have been able to get things such

* the number of conference started each day by using the 9th field in the conferences dim file (conference started at)

* the number of unique users using canvas each day by grepping each date against the more recently download requests files, and looking for unique occurrences of the 6th field (foreign key from users_dim), such as:

zgrep 2020-03-14 requests-00000-XXXX* | cut -f6 | sort | uniq | wc -l

I know this is not very efficient, but perhaps the quickest way for me to now get some rudimentary stats. The numbers I'm getting out of this seem to be in line with what I'd expect, but before I start publishing numbers, is this a valid approach?

Thanks,

Tim

robotcars · ‎03-17-2020

@tim_odonovan ,

I tried your example as:

>

19 2020-03-10
14 2020-03-11
17 2020-03-12
10 2020-03-13

and matched it with sql:

SELECT CONVERT(date, start_at), count(*)

FROM conference_dim
GROUP BY CONVERT(date, start_at)
ORDER BY CONVERT(date, start_at) ASC

>

2020-03-10 19
2020-03-11 14
2020-03-12 17
2020-03-13 10

Without SQL or something, getting granular stats (course, subject, account) is going to be hard.

Check out some of the following, which are command line tools for this.

csvkit 1.0.5 — csvkit 1.0.5 documentation

GitHub - harelba/q: q - Run SQL directly on CSV or TSV files

VisiData

edit:

With q

q -t -H "SELECT strftime('%Y-%m-%d',start_at) conf_date, count(*) FROM ./conference_dim.txt GROUP BY strftime('%Y-%m-%d',start_at)"

>

2020-03-10 19
2020-03-11 14
2020-03-12 17
2020-03-13 10

tim_odonovan · ‎03-17-2020

Hi Robert!

Good to see it matches with SQL - thanks for checking it out!

Yes, I'll be managing expectations about generating only very basic stats until we get a chance to do it properly.

Tim

[ARCHIVED] Using Canvas Data CSVs to generate rudimentary stats?

Archived

Questions

You're signed out

[ARCHIVED] Using Canvas Data CSVs to generate rudimentary stats?

Archived

Questions

Community Help

View our top guides and resources: