The Instructure Community will enter a read-only state on November 22, 2025 as we prepare to migrate to our new Community platform in early December.
Read our blog post for more info about this change.
Found this content helpful? Log in or sign up to leave a like!
I'm trying to post HTML content pages (e.g. syllabus) that contains HTML entities such as html5 checkmarks (✓). The content posts successfully, but the leading "&" is expanded server-side to "&and", I think. I've taken a look at the JSON I'm posting and it seems the entities are unexpanded in that file. Is there a standard trick for avoiding this? otherwise I guess I will have to just remove these characters from my syllabus, is that right?
Are you sending the data as a payload or appended to the query string?
Can you share the code or payload you're trying?
I have this string saved in ~/syl-pp.json:
-------------------
{
"course": {
"syllabus_body": "<table><tr><td class=\"org-left\">✓</td>\n<td class=\"org-left\">✓</td>\n<td class=\"org-left\">✓</td>\n<td class=\"org-left\">✓</td>\n</tr></table>n",
"is_public": "nil",
"grading_standard_id": 15,
"license": "cc_by_nc_sa",
"default_view": "syllabus",
"license": "cc_by_nc_sa"
}
}
-------------------
I post with this command:
-------------
curl -X PUT -d @/home/matt/syl-pp.json -H 'Authorization: Bearer API-SECRET' "https://uni-baseurl/api/v1/courses/64706" --header "Content-Type: application/json"
-------------
The final result is wrapped in a bunch of other HTML of course, but that snippet exports to this:
--------------
<table><tbody><tr>
<td class="org-left">&check;</td>
<td class="org-left">&check;</td>
<td class="org-left">&check;</td>
<td class="org-left">&check;</td>
</tr></tbody></table>
---------------
If I go in in devtools and change each &check; to ✓ they display as checkmarks, so I don't think the issue is in the browser. I imagine this is a standard escaping issue that I ought to be able to figure on my own, but cant...
I don't cURL very often... maybe pklove knows?
It's also been awhile since I sent HTML to the API, but in Python, I UTF-8 encode the html string before sending.
Also check these flags, curl - How To Use
I think its because Canvas doesn't like ✓. If you try ©, then it will work fine.
I get the same if I try via the browser.
Maybe there is a list of allowed entities somewhere.
It looks like you can use the unicode number.
These work: ✓ ✔
And more fun with ☑ ✅
ah yes thank you again Peter! I grabbed the whole W3C list from https://raw.githubusercontent.com/w3c/html/master/entities.json and pasted them into a request. It looks like only a small number of them are supported (I noticed arrows, a few math/logic symbols, and I think some playing card icons), but mostly they just rendered as "Á" etc. I haven't figured out yet how to unlock a page so it's truly public, but will maybe paste in a reference here when I figure that out.
I set emacs to export checkmarks to the unicode versions, and will do the same thing with other problem cases I run into. It's a bit of a bummer b/c I use my source code as a resource for my students & it would be nice for them to be able to read the symbol expressions... but it's a very small cost.
carroll-ccsd in an earlier post pointed to an IRC response that nokogiri is used to sanitize the HTML text, see HTML sanitation rules applied to HTML in submission body
The courses controller (courses_controller.rb) has:
if params_for_create.has_key?(:syllabus_body)
params_for_create[:syllabus_body] = process_incoming_html_content(params_for_create[:syllabus_body])
end
... eventually this ends up calling nokogiri to parse the HTML, then somewhere it is sanitizing the parsed HTML.
I considered reposting that, but there's no definition of what will be scrubbed as far as characters.
The HTML white list for only shows tags that are allowed, not characters.
https://s3.amazonaws.com/tr-learncanvas/docs/Canvas_HTML_Whitelist.pdf
I'd like to see if there's a difference between what the RCE will accept vs the API.
I'll look and ask around next week.
I looked the other day and found a couple of interesting files, but nothing worth posting about, so I didn't. Since you say you're going to look at it, I thought I'd mentioned what I had found.
There's one that attempts to take some extended characters and transliterate them. Things like "frac34" become "three fourths". This was my best hope at first, but didn't really look like it was what was happening. However, it was only place that I could find acute in the source code. It also lists things like copy, trade, reg, but where it does, it's talking about translating things generated by textile. I couldn't find a list of things that it accepted and things that were converted
The other was the sanitize portion. Saving a page, at least as an update, actually calls the API version of update a page, and I would imagine that other places sanitize the input for security reasons. Also, since it's using the API, it's taking things like ✓ and sending it to Canvas that way. What comes back is &check; -- that makes sense since it's the API that is processing both the RCE edits and the API calls made by us.
There are other files at play. There's an htmlEscape JavaScript file and probably a local version, but it's been a long day and I'm too tired to keep tracking things down now. The config/application.rb file has a line that says ActiveSupport::JSON::Encoding.escape_html_entities_in_json = true.
There was, I believe, a long time ago, a similar bug. Someone filed it with Canvas that they were converting HTML entities into & incorrectly. It might be worth reporting again. You should be able to enter a an HTML entity without resorting to looking up Unicode for it.
Community helpTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign inTo interact with Panda Bot, our automated chatbot, you need to sign up or log in:
Sign in