To Our Amazing Educators Everywhere,
Happy Teacher Appreciation Week!
Found this content helpful? Log in or sign up to leave a like!
Hi @Edina_Tipter and @LeventeHunyadi
When accessing parquet datasets for submissions & late_policies there seems to be inconsistencies with interpretation from downstream processing jobs for schema on parquet file for decimal vs float64.
Can canvas data team please look into correcting it - currently there is no-format on these columns and leaving it to interpretation of downstream stack to a decimal giving problems rather than to a format: float64. Possibly this would fix parquet files generated from canvas upstream jobs to create parquet files for the downstream consumers stacks to interpret them consistently as float64 format without any ambiguity as rest of all the schema on number has format float64.
dataset: submissions
- points deducted number<no-format> expected number<float64>
dataset: late_policies
- column missing_submission_deduction number<no-format> expected number<float64>
- column late_submission_deduction number<no-format> expected number<float64>
- column late_submission_minimum_percent number<no-format> expected number<float64>
Thanks
This is how these columns are declared in our descriptor:
Optional stands for nullable, Decimal is a fixed-point type, the first number for Precision is the number of significant digits, and the second is decimal digits. (Annotated is a type wrapper, it has no relevance to the issue.) In PostgreSQL, these would correspond to numeric(6,2) or numeric(5,2).
That said, all of these should be fixed-point numbers, not floating-point numbers. I will relay this issue to the team to look into this in more depth.
We have triggered a query job and inspected the Parquet output for a test account. This is how Parquet metadata look in parquet-tools inspect:
############ Column(late_submission_deduction) ############ name: late_submission_deduction path: value.late_submission_deduction max_definition_level: 1 max_repetition_level: 0 physical_type: INT32 logical_type: Decimal(precision=5, scale=2) converted_type (legacy): DECIMAL compression: GZIP (space_saved: -61%)
This seems pretty normal, with the correct fixed-point logical type applied to the column.
To participate in the Instructure Community, you need to sign up or log in:
Sign In