cancel
Showing results for 
Search instead for 
Did you mean: 
JenniferKolar
Community Participant

Need to clean formatting of extra   that are pasted in from hidden space characters in MS Word

Jump to solution

We have a course that the content was all pasted in from MS Word. When we view the html in canvas, we see '&nbbsp;' representing an extra space between most words. This results in canvas treating the line of words w/ nbsp as all one word and thus wrapping incorrectly.

We see now that if we go in word, enable paragraph symbol view and search and replace space with space, that it will clear the problem characters and we can then succesfully paste into canvas.

We also see we can copy from canvas, paste into word and choose clear formatting and then return to canvas and paste and that will work..

However, is there a way to do this w/in canvas itself w/o having to go back and forth to another editor? Clear Formatting in the rich text editor does not remove these problem characters.

I don't find a search and replace option in the rich text editor, search only.

Labels (1)
0 Kudos
2 Solutions

Accepted Solutions
chofer
Community Coach
Community Coach

Hello there, @JenniferKolar ...

In addition to the info that @James has been providing, have you tried using an HTML cleaner website?  There are some that allow you to paste in content from a Word document, and it will generate pretty clean HTML code for you.  There are also other websites that just help with general HTML clean-up of your code...including removing any extra non-breaking spaces like you are describing.  I wrote a blog post about this a while back called HTML Cleanup.  There are other sites that might be of interest, too:

Again, not sure if these would be of any interest, but I thought I'd throw these out there for your consideration.  Hope this helps a bit.

View solution in original post

James
Community Champion

@JenniferKolar 

Here is the code as-is except that I did comment out the line that does the custom formatting for my videos and their duration and length.

^!c::
#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.
clipboard = 
Send ^a
Send ^c
ClipWait
attributes := ["id", "class", "target"]
for index, attrib in attributes
{
  StringReplace clipboard, clipboard, %A_SPACE%%attrib%="",, 1
  StringReplace clipboard, clipboard, %A_SPACE%%attrib%="%A_SPACE% ,%A_SPACE%%attrib%=", 1
}  
StringReplace clipboard, clipboard, ’, ', 1
StringReplace clipboard, clipboard,  , %A_SPACE%, 1
StringReplace clipboard, clipboard, <p>%A_SPACE%</p>,, 1
StringReplace clipboard, clipboard, <span>%A_SPACE%</span>, %A_SPACE%, 1
StringReplace clipboard, clipboard, %A_SPACE%--%A_SPACE%, %A_SPACE%–%A_SPACE% , 1
newclipboard := RegExReplace(clipboard, "</?span[^>]*>")
newclipboard := RegExReplace(newclipboard, "s)\s+data-mathml="".*?""","")
; newclipboard := RegExReplace(newclipboard, "s)<h3>([^\(]+)\(([0-9]+/[0-9]+,\s~[0-9.]+\sminutes?)\)\s*</h3>","<h3>$1<span style='font-size:1rem; color:#066;'>($2)</span></h3>")
clipboard=%newclipboard%
ClipWait
Send ^v

 

Looking at the documentation, it seems that StringReplace shouldn't be used for new scripts, but rather than give you untried code, I am just giving what I have. It works, but the coding isn't pretty. It was pretty much a hack job in creating it.

By default, it uses Ctrl+Alt+c (first line of code). Here is a list of codes from the documentation if you would like something else.

  1. Download and install AutoHotKey on a Windows machine.
  2. Take the above code and save it into a file with a .ahk extension. Mine is called nbsp.ahk, but you could use canvas_cleanup.ahk or whatever you want.
  3. In Windows Explorer, right click the file and choose Run Script. It should put a green icon with a white H on the taskbar (might be hidden).

You can quit the script from the Windows taskbar green H icon.

In the future (say after a reboot), just locate the file in Windows explorer and do step 3.

To use the program.

  1. Edit a Canvas page
  2. Switch to HTML view
  3. Press Ctrl+Alt+c
  4. Save the page

I will warn you that I hate span elements so I remove all of them. Occasionally I need them, but too often it is Canvas sticking in something that doesn't need to be there.

If you have span elements that you want to keep, then edit the line with the first RegExReplace in it. You can put a semicolon ; in front of the line to comment it out.

Changes made to the source file are not automatically picked up. From the Green H icon, you can click the right mouse button and edit the script and/or reload the script.

If you have it installed on one computer and want it on other computers without installing AutoHotKey, from Windows Explorer, right click and choose Compile Script. This will make a small (1 MB) executable that can be transferred and ran on other computers.

Now that I look at the code, I forgot to mention the last line before the RegExReplace. It replaces a double dash -- surrounded by spaces with an endash –.

Feel free to comment out anything you like. The most important one is the one that converts all non-breaking spaces &nbsp; to regular spaces %A_SPACE%.

Remember that if it does hose something up, you can use the page history to restore the previous version.

View solution in original post

9 Replies
James
Community Champion

@JenniferKolar 

Non-breaking spaces are one of my pet-peeves. I try not to use Word, but Canvas and the Rich Content Editor (RCE) insert them automatically in places they shouldn't. It's better than it used to be with the old RCE, but for my writing, there is almost never a need for a non-breaking space.

You are correct that the Format > Clear Formatting option in the Rich Content Editor does not remove the non-breaking spaces, but then neither does the Home > Font > Clear All Formatting inside Word. This is because they don't consider the non-breaking space to be formatting.

The real fix is to not have non-breaking spaces (or double spaces, which will get converted when you paste) in the source. That means cleaning it up before you copy and paste.

I have a few technical skills, so my solution in 2017 was to write a macro that would copy the HTML code into a variable, then programmatically do a search and replace, and then paste it back into Canvas. Non-breaking spaces were just one of the items I replaced. If I run the script, then I just have to press the keyboard shortcut to execute it.

While I was at it, I cleaned up some other annoyances like curly-quotes, multiple spaces, blank paragraphs or spans, stripping mathml code, and a custom tweak for my headings where I need special formatting.

JenniferKolar
Community Participant

Thanks James!

yes, most of the folks on our team are aware of needing to clean the word docs up before pasting into Canvas. Unfortunately, one was not and all of the content is already in the course she was working on so I was trying to see if there were any suggestions w/in Canvas. Are you willing to share your script? Is it still compatible with the present canvas editor?

If search and replace were and option w/in the html or rich text editor, we might get somewhere, but from what I can see they are not.

 

James
Community Champion

@JenniferKolar 

Are you a Windows or mac user? It uses AutoHotKey, which is only for Windows. There may be updated software (not sure how AutoIt fits in), but I haven't tested it with that.

I forgot to mention it also removes any empty id, class, or target attributes. It's invalid HTML to have an empty ID.

JenniferKolar
Community Participant

Hi James

I can use either. Happy to give it a try in windows.

chofer
Community Coach
Community Coach

Hello there, @JenniferKolar ...

In addition to the info that @James has been providing, have you tried using an HTML cleaner website?  There are some that allow you to paste in content from a Word document, and it will generate pretty clean HTML code for you.  There are also other websites that just help with general HTML clean-up of your code...including removing any extra non-breaking spaces like you are describing.  I wrote a blog post about this a while back called HTML Cleanup.  There are other sites that might be of interest, too:

Again, not sure if these would be of any interest, but I thought I'd throw these out there for your consideration.  Hope this helps a bit.

View solution in original post

James
Community Champion

@JenniferKolar 

Here is the code as-is except that I did comment out the line that does the custom formatting for my videos and their duration and length.

^!c::
#NoEnv  ; Recommended for performance and compatibility with future AutoHotkey releases.
; #Warn  ; Enable warnings to assist with detecting common errors.
SendMode Input  ; Recommended for new scripts due to its superior speed and reliability.
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.
clipboard = 
Send ^a
Send ^c
ClipWait
attributes := ["id", "class", "target"]
for index, attrib in attributes
{
  StringReplace clipboard, clipboard, %A_SPACE%%attrib%="",, 1
  StringReplace clipboard, clipboard, %A_SPACE%%attrib%="%A_SPACE% ,%A_SPACE%%attrib%=", 1
}  
StringReplace clipboard, clipboard, &rsquo;, ', 1
StringReplace clipboard, clipboard, &nbsp;, %A_SPACE%, 1
StringReplace clipboard, clipboard, <p>%A_SPACE%</p>,, 1
StringReplace clipboard, clipboard, <span>%A_SPACE%</span>, %A_SPACE%, 1
StringReplace clipboard, clipboard, %A_SPACE%--%A_SPACE%, %A_SPACE%–%A_SPACE% , 1
newclipboard := RegExReplace(clipboard, "</?span[^>]*>")
newclipboard := RegExReplace(newclipboard, "s)\s+data-mathml="".*?""","")
; newclipboard := RegExReplace(newclipboard, "s)<h3>([^\(]+)\(([0-9]+/[0-9]+,\s~[0-9.]+\sminutes?)\)\s*</h3>","<h3>$1<span style='font-size:1rem; color:#066;'>($2)</span></h3>")
clipboard=%newclipboard%
ClipWait
Send ^v

 

Looking at the documentation, it seems that StringReplace shouldn't be used for new scripts, but rather than give you untried code, I am just giving what I have. It works, but the coding isn't pretty. It was pretty much a hack job in creating it.

By default, it uses Ctrl+Alt+c (first line of code). Here is a list of codes from the documentation if you would like something else.

  1. Download and install AutoHotKey on a Windows machine.
  2. Take the above code and save it into a file with a .ahk extension. Mine is called nbsp.ahk, but you could use canvas_cleanup.ahk or whatever you want.
  3. In Windows Explorer, right click the file and choose Run Script. It should put a green icon with a white H on the taskbar (might be hidden).

You can quit the script from the Windows taskbar green H icon.

In the future (say after a reboot), just locate the file in Windows explorer and do step 3.

To use the program.

  1. Edit a Canvas page
  2. Switch to HTML view
  3. Press Ctrl+Alt+c
  4. Save the page

I will warn you that I hate span elements so I remove all of them. Occasionally I need them, but too often it is Canvas sticking in something that doesn't need to be there.

If you have span elements that you want to keep, then edit the line with the first RegExReplace in it. You can put a semicolon ; in front of the line to comment it out.

Changes made to the source file are not automatically picked up. From the Green H icon, you can click the right mouse button and edit the script and/or reload the script.

If you have it installed on one computer and want it on other computers without installing AutoHotKey, from Windows Explorer, right click and choose Compile Script. This will make a small (1 MB) executable that can be transferred and ran on other computers.

Now that I look at the code, I forgot to mention the last line before the RegExReplace. It replaces a double dash -- surrounded by spaces with an endash –.

Feel free to comment out anything you like. The most important one is the one that converts all non-breaking spaces &nbsp; to regular spaces %A_SPACE%.

Remember that if it does hose something up, you can use the page history to restore the previous version.

View solution in original post

James
Community Champion

Thanks Chris ( @chofer ). I knew there were things out there, just didn't know which ones would get rid of the non-breaking spaces.

JenniferKolar
Community Participant

html-online.com/editor does work pretty well w/ a little manual cleanup needed.. however, it is only free for a few uses.

JenniferKolar
Community Participant

this is working beautifully. a few places where trailing spaces are left, but that is easy to cleanup. nicely done!