To Our Amazing Educators Everywhere,
Happy Teacher Appreciation Week!
Found this content helpful? Log in or sign up to leave a like!
I'm working with a Canvas Data 2 CSV export of the QUIZ_SUBMISSIONS table. The column SUBMISSION_DATA is usually a well formed JSON string. I have two questions about the column.
1. There are two rows (out of over 10 million) that are not proper JSON syntax. Instead, the values are both "__dap_oversized_truncated__". In our import we're going to watch for that value and change it to a NULL so it doesn't trip up our processing. Is the actual value stored somewhere else?
2. For quiz answers where question_type is 'multiple answer question', the answers are stored in the JSON string with a syntax like this:
{"correct":"partial","points":1.0,"question_id":5363828,"text":"","answer_8325":"1","answer_3704":"0","answer_9236":"1","answer_446":"0","answer_3639":"0","more_comments":""}
This is just one "row" in the JSON string. Since the answer value isn't provided as a known JSON key in the string, how can I extract the value for the answer? I do understand that the key name has the answer_id embedded in it. But I don't know how to extract the value for a key when I don't know name of the key.
Those are really good questions. I started my Canvas Data 2 journey just about 24 hours ago and stumbled across some of what you asked about as I was trying to figure things out.
The answer to the first question is explained in the Canvas Data 2 Technical Frequently Asked Questions document.
What does '__cd2_oversized_truncated__' mean? (Some of the fields/columns can have this as a value.)
__cd2_oversized_truncated__
is a special placeholder (string constant) that indicates that the original data was too large to include in the result, usually several MBs in size. Unfortunately, it is not possible to recover the value of these fields/columns. However, such large values are typically the result of a mistake (e.g. a student copying a binary Word document into a plain text field), and seldom carry valuable information useful for analytics.
So the answer is no, you won't find those values somewhere else in Canvas Data 2. You may be able to get them through the REST API if you find out you really need them.
For the second question, I looked at the Canvas Data 2 Frequently Asked Questions document, which has a link to the OpenAPI spec at the bottom that takes you to the Instructure API Gateway page. Click on Data Access Platform. Near the top are schemas for PostgreSQL, MySQL, or JSON. You can also scroll down to appropriate table to get a description, but not the type, of each field.
You mentioned this was related to quiz_submissions and when I look there, I see that quiz_submissions.submission_data has a type "text" with a comment "Student answers to quiz, it shares JSON array of answers for only graded quiz submissions, it has NULL if quiz hasn''t been graded yet. Contains the `question_id`, whether the question was answered correctly, the `answer_id` and answer text."
In other words, it's a JSON object and it's stored in a text field that and not processed or split when putting it into your local CD2 database. That is, you do not try to create columns for each answer key when you stick it in the database. Just store it as it is.
In your analysis, you will need to handle it as you would any other JSON object. That may mean using the JSON functions of your database or it may mean processing it with a script. Since you don't know the name of the key, it will probably be easiest to process using a script.
If you haven't seen those documents, I'm finding them in the Canvas Data 2 section of the Canvas Admin Guide.
Hi James,
We are also trying to extract the Question ID and Answer ID pairs from the SUBMISSION_DATA field in the QUIZ_SUBMISSIONS table. Then we use these ID's to join to the QUIZ_QUESTIONS table to get the Question Text and the Answer Text (which requires unpacking the Answers field in QUIZ_QUESTIONS). We are using SQL in an Snowflake environment. Does anyone know of tables that may already exist with these fields unpacked into rows? Or is there any general purpose code that will work for all Question Types? Just trying to see if there is an easier way.
Thanks
Dexter
I'm trying to come up with a more elegant solution to the "__cd2_oversized_truncated__" value other than deleting the row that has that value. Since the value is of no use, other that pointing out that something isn't there when it should be, deleting the row works, for now.
"__cd2_oversized_truncated__" is not a valid JSON string so it prevents me from putting an Oracle JSON check constraint on the column. A search index requires the JSON check constraint.
Instead of deleting the row, I was considering replacing the value with something that is acceptable to a JSON interpreter. The easy option is to simply use "[]" (square brackets) like many other (valid) rows are using. But I would prefer to put something in the column that could be used to show that the value in the column is an "error".
Any suggestions on what JSON value could be put there?
I resorted to a spectacularly complex series of substitutions treating the string as text to transform the JSON such that I ended up with a consistent key and the answers as the value. I actually broke a lot of these out into additional tables so that they could be joined in SQL and actually utilised.
Not necessarily the most sustainable approach, as it relies on manipulating the JSON, and there are lots of edge cases that make it very complex.
Just for fun.....this is Microsoft SQL Server - [submission_data] corresponds to the subset you have posted - and was unpacked from the larger JSON string in an earlier step.
Thank you for your response and code snippet.
I was hoping there was a way to process the data without resorting to a cursor (row-by-row) approach. I spent enough time trying to find a more elegant solution that I was fairly certain that I was going to have to get down and dirty and write some code. True/false, multiple choice and even short essay answers are simple to deconstruct. Oracle has some handy features for dealing with JSON, as long as the keys are predictable and known.
One option I considered was using the question (from quiz_questions) to construct dynamic SQL for the multiple answer question. The question knows the answer_id values, so it would work. But then I'm back to operating on a row-by-row basis, which wouldn't be too bad at the question level instead of the answer level.
Another option I'm exploring is using a text index that is JSON-aware.
I extract out all the answers into separate (custom) tables as I import the data - expanding out into multiple rows per submission, and handling single and multi-answer responses into different tables.
Whilst the manipulation of the JSON to generate answer keys works, it is not particularly elegant or sustainable - I am going to revisit this and work back up the chain - when I unpack the original questions, and the single answers, I will change the keys to the answer_nnnn format - then I should get a consistent key that I can join on.
To participate in the Instructure Community, you need to sign up or log in:
Sign In