Community help

benjamin_rodrig

Hey there,

My institution is going to be implementing CD2 in SQL Server and Power BI in the not to distant future.

I'm reaching out to see if anyone might be interested in sharing techniques, and lessons learned in any of the following:

how they orgainzed the data, if and how they went about in dimensional modeling of Canvas Data, joining to Banner data, and how they were able to leverage apps in Power BI to give the most value to thier business users. Any feedback would be greatly appreciated!

Thanks!

Best regards,

Ben R

Pete5484

Are you planning to do this in Azure? via Automation runbook? Durable function? It's pretty straightforward to get data from MSSQL to PowerBi, but useful to know the reasearch questions you want to solve to begin.

benjamin_rodrig

Thanks Pete!

We are going to open the project soon and are still in the requirements gathering stage. I think the first question we are going to tackle is:

Identify which students are in need of remediation for retention purposes.

A lot of the requests for data we get have been somewhat general, not too specific, so more interviews are needed. But from that question alone, I think we can still make something useful like:

Over a period of n weeks related to the course start and end dates (Banner part of term)?
Of the students in a program, class, campus, etc., (Banner student data)
- which have an unposted current score < n?
- are missing n assignments?
- how much has their score / missing assignments changed during the duration of the course?

The KPI of retention, is another challenge to track, that we haven't quite gotten to yet but might be in reductions in withdrawals, increase in grade average, or registration for courses in the same program the following term.

Best regards,

Ben

akhelms

We are using different technology at my institution, but I can still share some things that I've picked up. Each morning, we grab and store the raw Canvas data in an ODS (Operational Data Store) schema on our data warehouse. Later, we run a series of ETL processes that transform the raw Canvas data, join information from our SIS, and store the results in a different set of tables. These new tables are then used for reporting and analytics. We’re still in the early stages of using Canvas data, so that part is rapidly changing for us as new needs are identified, but we're using dimensional modeling practices and ultimately creating star schemas with the data. We haven't yet got to the point of a unified and complete data model, but we're working towards it.

From a technical perspective, I found that using a lot of intermediate staging tables improved performance and made it a lot easier to design/test the data flow. I usually prefer simple ETL flows, where I might transform raw data and join SIS data in a single step. The quantity of data in some Canvas tables, along with the need to perform more complicated transformations to extract SIS keys from Canvas fields, made it much easier to split this out into several mini steps. In some cases, I also made use of table partitioning to avoid re-processing older data that was no longer changing.

When it comes to design, my biggest piece of advice is to assume that you're going to get some wonky data and actively design with that assumption in mind. Our faculty have a lot of latitude in designing their courses, and I found a lot of things in the data I wasn't initially expecting. If an instructor manually enrolls a student into their Canvas course that is not officially enrolled in the SIS, do you report that student's data along with the rest of the class or use your SIS enrollments as the source of truth? Last week, a faculty member (seemingly out of nowhere) updated the enrollment term of their class from several semesters ago and I needed to update one of my joins to SIS data in order to correct the mismatch. A big one (for us) was that we were initially reporting students who had not been logging into their courses, but we found that some of the LTIs being used allowed students to access the material directly (i.e., through the textbook publisher's website) so they weren't going through Canvas to access their assignments - so now we use the last access time OR the latest assignment submission time to determine the student's most recent class activity.

I hope that gave you at least something useful to think about, but let me know if you have any questions!

Best,
Andy

benjamin_rodrig

Hi Andy!

Thank you. Your overview is great, and something that I'll share with our project team to help set the expectations of what it might entail. Great comment about nuances in the data especially related to how instructors and LTIs can throw curveballs that we need to plan for.

Here's a question, as far as discovering wonky data, is that something that your team found internally, or was it discovered by revealing reports to academic stakeholders, and giving them the opportunity to point those out to you?
I suspect that part of the project may be a pilot where stakeholders can preview beta versions of the reports and provide feedback for areas of refinement.

Best regards,

Ben

Seeking advice on CD2, ETL and presentation processes

cd2

Power BI

EDUCAUSE Insights: Data and Decision-Making

Problems with GraphQL in User Pageviews reports

Metrics Easy Button

Analytics API / Metrics calculation logic

CD2: Enhanced Rubrics

Inconsistencies in user submission attempts (new q...

Seeking advice on CD2, ETL and presentation proces...

Sample Data

Canvas Data Access Platform (DAP) Python Client Li...

Seeking Advice: Integrating CD2 Data for Student &...

You're signed out

Seeking advice on CD2, ETL and presentation processes

Community help

View our top guides and resources: