Community help

matt_price · ‎11-20-2018

I'm trying to post HTML content pages (e.g. syllabus) that contains HTML entities such as html5 checkmarks (&checkmark;). The content posts successfully, but the leading "&" is expanded server-side to "&and", I think. I've taken a look at the JSON I'm posting and it seems the entities are unexpanded in that file. Is there a standard trick for avoiding this? otherwise I guess I will have to just remove these characters from my syllabus, is that right?

robotcars · ‎11-20-2018

Are you sending the data as a payload or appended to the query string?

Can you share the code or payload you're trying?

matt_price · ‎11-20-2018

I have this string saved in ~/syl-pp.json:

-------------------

{
    "course": {
        "syllabus_body": "<table><tr><td class=\"org-left\">&check;</td>\n<td class=\"org-left\">&check;</td>\n<td class=\"org-left\">&check;</td>\n<td class=\"org-left\">&check;</td>\n</tr></table>n",
        "is_public": "nil",
        "grading_standard_id": 15,
        "license": "cc_by_nc_sa",
        "default_view": "syllabus",
        "license": "cc_by_nc_sa"
    }
}

-------------------

I post with this command:

-------------

curl -X PUT -d @/home/matt/syl-pp.json -H 'Authorization: Bearer API-SECRET' "https://uni-baseurl/api/v1/courses/64706" --header "Content-Type: application/json"

-------------

The final result is wrapped in a bunch of other HTML of course, but that snippet exports to this:

--------------

<table><tbody><tr>
<td class="org-left">&check;</td>
<td class="org-left">&check;</td>
<td class="org-left">&check;</td>
<td class="org-left">&check;</td>
</tr></tbody></table>

---------------

If I go in in devtools and change each &check; to &check; they display as checkmarks, so I don't think the issue is in the browser. I imagine this is a standard escaping issue that I ought to be able to figure on my own, but cant...

robotcars · ‎11-20-2018

I don't cURL very often... maybe pklove‌ knows?

It's also been awhile since I sent HTML to the API, but in Python, I UTF-8 encode the html string before sending.

Also check these flags, curl - How To Use

pklove · ‎11-20-2018

I think its because Canvas doesn't like &check;. If you try ©, then it will work fine.

I get the same if I try via the browser.

Maybe there is a list of allowed entities somewhere.

pklove · ‎11-20-2018

It looks like you can use the unicode number.

These work: ✓ ✔

pklove · ‎11-20-2018

And more fun with ☑ ✅

matt_price · ‎11-20-2018

ah yes thank you again Peter! I grabbed the whole W3C list from https://raw.githubusercontent.com/w3c/html/master/entities.json and pasted them into a request. It looks like only a small number of them are supported (I noticed arrows, a few math/logic symbols, and I think some playing card icons), but mostly they just rendered as "&Aacute" etc. I haven't figured out yet how to unlock a page so it's truly public, but will maybe paste in a reference here when I figure that out.

I set emacs to export checkmarks to the unicode versions, and will do the same thing with other problem cases I run into. It's a bit of a bummer b/c I use my source code as a resource for my students & it would be nice for them to be able to read the symbol expressions... but it's a very small cost.

maguire · ‎11-22-2018

carroll-ccsd in an earlier post pointed to an IRC response that nokogiri is used to sanitize the HTML text, see HTML sanitation rules applied to HTML in submission body

The courses controller (courses_controller.rb) has:

if params_for_create.has_key?(:syllabus_body)
params_for_create[:syllabus_body] = process_incoming_html_content(params_for_create[:syllabus_body])
end

... eventually this ends up calling nokogiri to parse the HTML, then somewhere it is sanitizing the parsed HTML.

robotcars · ‎11-22-2018

I considered reposting that, but there's no definition of what will be scrubbed as far as characters.

The HTML white list for only shows tags that are allowed, not characters.

https://s3.amazonaws.com/tr-learncanvas/docs/Canvas_HTML_Whitelist.pdf

I'd like to see if there's a difference between what the RCE will accept vs the API.

I'll look and ask around next week.

James · ‎11-22-2018

I looked the other day and found a couple of interesting files, but nothing worth posting about, so I didn't. Since you say you're going to look at it, I thought I'd mentioned what I had found.

There's one that attempts to take some extended characters and transliterate them. Things like "frac34" become "three fourths". This was my best hope at first, but didn't really look like it was what was happening. However, it was only place that I could find acute in the source code. It also lists things like copy, trade, reg, but where it does, it's talking about translating things generated by textile. I couldn't find a list of things that it accepted and things that were converted

The other was the sanitize portion. Saving a page, at least as an update, actually calls the API version of update a page, and I would imagine that other places sanitize the input for security reasons. Also, since it's using the API, it's taking things like &check; and sending it to Canvas that way. What comes back is &check; -- that makes sense since it's the API that is processing both the RCE edits and the API calls made by us.

There are other files at play. There's an htmlEscape JavaScript file and probably a local version, but it's been a long day and I'm too tired to keep tracking things down now. The config/application.rb file has a line that says ActiveSupport::JSON::Encoding.escape_html_entities_in_json = true.

There was, I believe, a long time ago, a similar bug. Someone filed it with Canvas that they were converting HTML entities into & incorrectly. It might be worth reporting again. You should be able to enter a an HTML entity without resorting to looking up Unicode for it.

protect html entities from server-side escaping?

Canvas Data

Open API

add `desc` or `asc` to a few api calls that are al...

Integrate a React App (Not Canvas) to be Canvas LM...

Donot have valid data to test certain APIs

Getting Canvas LTI Data

Prevent Faculty From Using "EX" grade in Gradebook

Is the ‘Enable self assessment’ field available in...

Canvas Rubrics API Criteria

api/v1/courses/sis_course_id:{id#}/sections - Not ...

add `desc` or `asc` to a few api calls that are al...

Integrate a React App (Not Canvas) to be Canvas LM...

You're signed out

protect html entities from server-side escaping?

Community help

View our top guides and resources: