< BACK TO ARTICLE INDEX
How to paste clean text into
a CMS or html editor
Have you ever copied text
from a Word document and pasted it into your web content management
system (CMS), only to find that this action messes with the formatting
of the text on your site? If you are not careful you may end
up with text that looks like this in your Word document:
This is text that was
keyed in Microsoft® Word. "It's possible,
you can never know, that the universe exists only for me. If
so — it's sure going well for me, I must admit,"
Bill Gates, ©TIME magazine, January 13, 1997. Available
for £5 at newsstands everywhere.
But looks like this on your
website:

That happens because text
copied from Word (or other rich text documents) contains formatting
that comes along for the ride and can wreak havoc on your website
formatting in the process. And once it happens it's hard to undo.
An ounce of prevention will save you a pound of cure...just don't
paste formatted text into a CMS or web authoring tool. There
are three methods to solve this problem.
Method 1: copy-paste-copy-paste
using a plain text document
- Select the desired text in
the Word document and copy it.
- Paste the text into a plain
text document (use Windows Notepad or Mac TextEdit formatted
as plain text, or another application with a file extension .txt).
The plain text file can be saved if desired, but you don't need
to.
- Select the text again in
your plain text document, copy it and paste it into your CMS.
It's that easy, at least it
can be. You may still have some special characters that come along
for the ride, even with this copy-paste-copy-paste method. More
about that later.
Method 2: copy-paste-copy-paste
using a WYSIWYG control
Many content management systems
(CMS) and html editors use a WYSIWYG (what you see is what you
get) tool bar like the one shown below. If your system has something
similar, see if there is a button that allows you to "Paste
as plain text" or
"Paste from Word."

Use these tools just as described
in method #1. Select and copy the text in your Word document, click
the "Paste from Word" or "Paste as plain text" button,
paste the text into the resulting popup window, then click OK.
This method should get rid of text formatting like font specification,
point size, style, etc. But it doesn't always clean up all the
potential problems. In particular you may still need to change
special characters to web-friendly formatting.
Debugging after you copy-paste-copy-paste

Most of your "phantom formatting"
problems should be solved by using one of the above methods. You
may still have some unwanted special characters that come along
for the ride, even with those copy-paste-copy-paste methods.
Some of the more common ones are "curly" quotation
marks and apostrophes (also called "smart quotes"),
which must be changed to "straight" quotation marks
and apostrophes. To prevent this, you can search your Word document
and replace all the quotation marks and apostrophes with straight
ones. Microsoft also offers a helpful
article on turning off smart quotes so you don't end up with
them in your Word document in the first place. Other characters
that may not play nice with "copy and paste from Word" are
dashes — and symbols like © copyright marks,
trademarks™ and registered trademarks ® — and
there are dozens of other potential offenders.
Note that you may not notice
these problem characters until you view your web pages in
a variety browsers on different platforms (See The
10 Cs of Great Content, #9: Compatible). That's because the
special characters look good to you, since your system knows
how to display them. That may not be the case in other browsers
or operating systems.

If you see problem characters
show up on your web pages you must seek them out and eradicate
them. Your html or WYSIWYG tool bar should have a button for "special
characters" that gives you a character selector like the one
shown here. Delete the bad characters and use this tool to insert
web-friendly replacements.
If you have the ability to work
directly on the html source code, you can also fix special character
problems by using html code known as "ampersand characters" or
"character entities." This
html source website offers a helpful list of all the special
character codes. For example, to produce a © symbol, you place
this code in the html: ©
More clean-up
Another helpful tool that some
WYSIWYG toolbars offer is a "Remove Format" tool (in
the example below it's an eraser icon).

If you have such a tool, select
all the text after is it pasted into the CMS, and click the eraser
button. Voila! Unwanted formatting that may have been pasted in
from Word is erased. You can also use this button to erase formatting
that you did using the CMS' or html editor's WYSIWYG tools.