I have started to transition some of our reports to Power BI, although most have been unrelated to academics thus far.
If Canvas BI developers want to collaborate and / or share count me in though.
To expand on Daniel's response, we download daily from the Canvas Data Portal using the canvasDataCli tool. This is then uploaded into an Oracle database and presented as views in the data lake that Daniel refers to.
It seems like a lot of institutions are using the bulk downloads. We are querying Redshift via ODBC into excel files and putting them in sharepoint. Our Power BI reports refresh from those sharepoint files. The sharepoint files have auto refresh through a program called Power Update or a PowerShell script. Reworking this via SQL Server is on the agenda but not crucial as we are not dealing with huge amounts of data.
In the future I would like to develop an application which connects Power BI to redshift via the API interface. All the data analysis / visuals would be created through Python scripts and the managing program would be in C++. This is still VERY far away though.
Our requests data is currently about 200 GB and growing at about 3 GB per day during the semester. What is your experience of performance when querying requests data via Redshift?
When downloading using the canvasDataCli sync process, only files which don't already exist in the local data store are downloaded, so it's not really a bulk download.
There are only two reports I have created which rely on the Request table. For both reports I am looking at / analyzing real specific things so the queries I am running are not pulling in more than 500k rows. I also do not bring the tables into excel, rather, I load them to the data model through PowerQuery. Usually, there is a good bit of filtering I can do within PowerQuery before that as well. For Power BI the data query process is very similar to PowerQuery. It does a good job of compressing the data so you can get a good amount in before it starts to lag.
The only other thing that helps is having a computer with an i7, SSD, graphics card and a good amount of ram helps. If I query just an open month's worth of data then it will take a ton of time to complete. However, if you know what you want from the table it is not too bad.