cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
afuenmayor
Community Member

Joining Canvas Fact and Dimension Files

Jump to solution

New canvas user here with limited experience working with star schema data.  I am having trouble figuring out which fact and dimension files I need to build a simple table.  The table I want would look like this:

Assignment Table

Description: Each row is the score a given student got on a given assignment in a given course,section,term.

Student IDCourseSectionTermAssignment DescriptionAssignment DateScorePoints Possible
1001ANAT1700011605HW Assign 18/10/20181820
1001ANAT1700011605HW Assign 28/20/20181520
1001ANAT1700011605HW Assign 39/6/20181120
1001ANAT1700011605HW Assign 410/11/20181920
1001ANAT1700011605HW Assign 511/16/20182020
1501ENGL3700031605HW Assign 18/10/20181520
1501ENGL3700031605HW Assign 28/20/20181020
1501ENGL3700031605HW Assign 39/6/2018820
1501ENGL3700031605HW Assign 410/11/2018020
1501ENGL3700031605HW Assign 511/16/201812

20

Any help putting this together would be greatly appreciated. Thanks!

Tags (1)
1 Solution
afuenmayor
Community Member

I figured out what was the issue.  I was pulling in the files into SPSS 24 to just explore their structure.  The SPSS auto detect variable type was reading the id variable as a numeric and rounding to nearest 10s.  When i switched the data read to force the variable into a string it pulled in the ID variables properly.  Glad it the problem wasn't in the underlying data.

Thanks again for all your help!

View solution in original post

8 Replies
kona
Community Coach
Community Coach

 @afuenmayor , greetings! Due to the technical nature of this question I am sharing it with the Canvas Developers and Data and Analytics groups in the Community. You might consider joining these group so you get access to their resources and information.

Kona

0 Kudos
James
Community Champion

 @afuenmayor ,

 

You'll want to start with the documentation of the Canvas data schema: Canvas Data Portal 

The assignment tables contain information about the assignments themselves, but no scores. Scores are contained in the submissions tables. You'll also need to bring in the course_dim to get the name of the course, the enrollment_dim to get the section ID for the enrollment to know which section to use, and then the course_section_dim to get the name of the section.

If the student ID you have is their SIS student ID, then you'll need to bring in the pseudonym tables. You may possibly need the term tables if you want a pretty name for the term.

Whether you use the _fact or _dim table depends on the information that you need and what indexing you have done so that it's not horrifically slow.

I'm not sure what you mean by "assignment date". Is that the date it's due, the date it's available until, or the date the student submitted the assignment? The date of submission is in the submission_dim table while the others would be in the assignment_dim table -- unless there was an override for a section (assignment_override) or user (assignment_override_user).

When building the SQL for this, I would start with the submission_fact table to get the score. It has connections to the assignment, course, enrollment_term, and enrollment_rollup_dim tables.


I haven't used the enrollment_rollup_dim, but you could join that with the enrollment_dim table using the student_enrollment_id and then join that to the course_section_dim to get the name of the section.

Along the way, you'll need to check things like workflow_state to make sure things are published or active. There are a lot of other restrictions to make sure you get the right data out.

I probably left something out, but hopefully that's enough to get you started.

afuenmayor
Community Member

I am looking at course_section_dim and the only unduplicated key on the table is name which names each section we've put into canvas.  My table has about 40000 rows and 40000 unique values for sectionname but only 8000 unique values of course_section_id.  How can I use course_section_id as key if each value maps to about 5 different sections?

Should the course_section_id in course_section_dim be unduplicated? Is my data bungled?

0 Kudos
James
Community Champion

It sounds bungled or at least the names of the columns.

In most of the dim tables, when there is an id column, it is the primary field for that table. That's the way that my course_section_dim is setup. I also have canvas_id as a unique key. There is no requirement that name be unique. In fact, as of last fall, we had 3587 sections named "Need to complete Orientation".

It sounds like someone is renaming the fields from Canvas at there is no field called course_section_id in course_section_dim as it comes from Canvas. I stuck with their names from the schema and recommend that (or a shortened version if you're using Oracle) so it matches up with the documentation and the way that data comes from the API. Everywhere in the Canvas API, id represents the primary field for that object type (id is a submission_id if you're looking at submissions), and it's mostly the same in Canvas Data.

I will acknowledge, that a strong argument could be made for combining the name of the table and the id field, for example the course_section_dim.id field gets named course_section_id, as that makes it easier to join tables together with software that automatically tries to look for common fields (like Tableau). However, if someone is trying to reverse engineer what the fields are from looking at the data, you will probably start over with the documented schema. It could also get bungled if someone messed up the SQL statement that did the import and transposed a couple of rows.

afuenmayor
Community Member

I figured out what was the issue.  I was pulling in the files into SPSS 24 to just explore their structure.  The SPSS auto detect variable type was reading the id variable as a numeric and rounding to nearest 10s.  When i switched the data read to force the variable into a string it pulled in the ID variables properly.  Glad it the problem wasn't in the underlying data.

Thanks again for all your help!

James
Community Champion

That would do it. Thanks for coming back and sharing that solution in case someone else runs into it as well.

jasmiene_doming
Community Member

Hi. I am experiencing the same challenge here in our school. I wanted to be able to find the right table to join in order to produce specific reports. In my case, I use Power BI.  To start with, I would like to generate a monthly status report for courses under each sub-account level. I need to join tables that will contain:

selected sub-account  + selected date range + course id + faculty who handles the course + number of  content published + number of unique views + number of unique responses 

 

jasmiene_doming
Community Member

SAMPLE NO. 1 

selected sub-account + selected date range + course id + faculty who handles the course + total no. of students enrolled + number of content published 

 

SUB-ACCT

DATE RANGE

COURSE ID

FACULTY

NO. STUDENTS ENROLLED

NO. OF CONTENTS PUBLISHED

CAS

Sept.2-Sept. 30, 2019

ADV 2

Juana Dela Cruz

45

5

 

SAMPLE NO. 1 

selected sub-account + selected date range + course id + faculty who handles the course + total no. of students enrolled + specific content published + number of unique views + number of unique responses 

 

SUB-ACCT

DATE RANGE

COURSE ID

FACULTY

NO. STUDENTS ENROLLED

SPECIFIC CONTENTS PUBLISHED (Heading)

NO OF UNIQUE VIEWS

NO. OF UNIQUE RESPONSES

CAS

Sept.2-Sept. 30, 2019

ADV 2

Juana Dela Cruz

45

Announcement 1

15

0

CAS

Sept.2-Sept. 30, 2019

ADV 2

Juana Dela Cruz

45

Quiz 1

43

37

CAS

Sept.2-Sept. 30, 2019

ADV 2

Juana Dela Cruz

45

Discussion 1

2

0

CAS

Sept.2-Sept. 30, 2019

ADV 2

Juana Dela Cruz

45

Discussion 2

4

0

CAS

Sept.2-Sept. 30, 2019

ADV 2

Juana Dela Cruz

45

Discussion 3

41

0